Syllabi Analysis

System Architecture

Variation A — Knowledge Graph Intelligence

MSA 8700 — DAIS Project

Project Overview

System: Syllabi Analysis DAIS

Corpus: Class syllabi from all GSU colleges and programs (PDF)

Objective: Build a semantic knowledge graph to enable querying about:

  • Topics covered across courses and programs
  • How instructors address AI usage policies
  • Prerequisite chains and learning outcomes
  • Cross-program coverage overlaps

User Persona

Academic Program Coordinator / Curriculum Analyst

Context: GSU Office of Academic Affairs or college-level curriculum committee

Goals

  • Identify which courses cover specific topics across GSU
  • Compare how different programs address the same subject area
  • Analyze AI usage policies across instructors and departments
  • Map prerequisite dependencies and curriculum gaps
  • Support accreditation by mapping learning outcomes to courses

Pain Points

  • Manually reviewing hundreds of syllabi is infeasible
  • No structured, queryable representation of syllabi content exists
  • Cross-program comparisons require scanning documents from multiple colleges

Key Use Cases

#Use CaseExample Query
1Topic Coverage“Which courses cover NLP?”
2AI Policy Comparison“How do Robinson vs. Arts & Sciences address generative AI?”
3Cross-Program Overlap“What topics are shared between MS Analytics and MS CS?”

Key Use Cases (cont.)

#Use CaseExample Query
4Prerequisite Chains“What is the prerequisite chain for advanced ML courses?”
5Learning Outcomes“Which courses list ‘critical thinking’ as an outcome?”
6Temporal Trends“Has AI policy language increased over the last 3 semesters?”

Knowledge Graph Schema — Entities

Entity TypeKey Attributes
Coursecourse_code, title, credit_hours, level
Instructorname, department, college
Programname, degree_type, college
Collegename
Topicname, category
Learning Outcomedescription, bloom_taxonomy_level
Textbooktitle, author, edition
AI Policypolicy_type, details
Semesterterm, year
Assessment Methodtype, weight_percent

Knowledge Graph Schema — Relationships

RelationshipFrom → To
TAUGHT_BYCourse → Instructor
BELONGS_TOCourse → Program
OFFERED_BYProgram → College
COVERS_TOPICCourse → Topic
HAS_OUTCOMECourse → Learning Outcome
USES_TEXTBOOKCourse → Textbook
HAS_AI_POLICYCourse → AI Policy
HAS_PREREQUISITECourse → Course
OFFERED_INCourse → Semester
USES_ASSESSMENTCourse → Assessment Method

System Architecture

graph TB subgraph UI["USER INTERFACES"] Chat["Chat Interface
(Web UI / API)"] Batch["Batch Query Interface
(CLI / REST)"] end subgraph Orch["ORCHESTRATION LAYER (LangGraph)"] Router["Query Router Agent"] Coord["Agent Coordinator"] RespGen["Response Generator Agent"] end subgraph Retrieval["QUERY & RETRIEVAL LAYER"] GraphQ["Graph Query Agent
(Cypher → Neo4j)"] VecQ["Vector Search Agent
(Qdrant)"] SQLQ["SQL Query Agent
(PostgreSQL)"] end subgraph Stores["DATA STORES"] Neo4j["Neo4j
(Knowledge Graph)"] Qdrant["Qdrant
(Vectors)"] Postgres["PostgreSQL
(Metadata)"] end subgraph Extract["EXTRACTION LAYER"] EntExt["Entity Extraction"] RelExt["Relationship Extraction"] GraphVal["Graph Validation
& Dedup"] end subgraph DocProc["DOCUMENT PROCESSING"] Ingest["PDF Ingestion"] TextChunk["Text Extraction"] VecEmbed["Vector Embedding"] SectionClass["Section Classifier"] end subgraph External["EXTERNAL"] Ollama["Ollama Endpoint"] end Input["PDF Syllabi"] --> Ingest --> TextChunk --> SectionClass SectionClass --> EntExt --> GraphVal --> Neo4j SectionClass --> RelExt --> GraphVal SectionClass --> VecEmbed --> Qdrant Chat --> Router --> Coord Batch --> Router Coord --> GraphQ --> Neo4j Coord --> VecQ --> Qdrant Coord --> SQLQ --> Postgres GraphQ --> RespGen VecQ --> RespGen SQLQ --> RespGen Ollama -.-> SectionClass Ollama -.-> EntExt Ollama -.-> VecEmbed Ollama -.-> Router Ollama -.-> RespGen

Document Processing Layer

ComponentTechnology
PDF IngestionPyPDF2, PDFMiner
Text Extraction & ChunkingPDFMiner, custom logic
Section ClassifierLLM via external Ollama
Metadata ExtractionLLM + regex
Vector EmbeddingOllama (nomic-embed-text)

Classifies syllabus text into semantic sections: course description, topics/schedule, AI policy, grading, learning outcomes, textbooks, prerequisites

Extraction & Graph Construction

AgentPurpose
Entity ExtractionIdentify courses, instructors, topics, outcomes, textbooks, AI policies
Relationship ExtractionInfer COVERS_TOPIC, HAS_PREREQUISITE, TAUGHT_BY, etc.
Graph Validation & DedupResolve duplicates, normalize topic names, enforce schema

Data Stores

StoreTechnologyContents
Knowledge GraphNeo4jEntities and relationships
Vector StoreQdrantText chunk embeddings
Relational StorePostgreSQLMetadata, raw text, AI policies, eval logs

Query & Retrieval Layer

AgentPurpose
Graph Query AgentNatural language → Cypher queries against Neo4j
Vector Search AgentSemantic search over syllabus chunks in Qdrant
SQL Query AgentStructured queries against PostgreSQL metadata

Orchestration Layer

ComponentPurpose
Query RouterClassify intent → route to graph, vector, SQL, or combination
Agent CoordinatorManage multi-agent execution; merge results
Response GeneratorSynthesize answer with citations to source syllabi

Technology Stack

LayerTechnology
LanguagePython 3.11+
LLM (generation)External Ollama (llama3.1, mistral)
LLM (embeddings)External Ollama (nomic-embed-text)
Agent FrameworkLangGraph
LangChainlangchain-ollama (ChatOllama, OllamaEmbeddings)
PDF ProcessingPyPDF2, PDFMiner
Knowledge GraphNeo4j
Vector DatabaseQdrant
Relational DBPostgreSQL
Web APIFastAPI
ContainersDocker + Docker Compose

Why LangGraph?

Two distinct multi-step pipelines with conditional branching:

  • Nodes = agent steps (extract, classify, embed, query, respond)
  • Edges = transitions with conditional routing
  • State = shared context passed between nodes

LangGraph’s state-graph model maps naturally to both the ingestion and query pipelines.

LangGraph — Ingestion Graph

graph TD Start(["START (PDF path)"]) Extract["extract_text
PyPDF2 / PDFMiner"] Classify["classify_sections
LLM (Ollama)"] Entities["extract_entities
(LLM)"] Relations["extract_relations
(LLM)"] Embed["generate_embeddings
(Ollama)"] StoreMeta["store_metadata
(PostgreSQL)"] Validate["validate_graph
& dedup (LLM)"] StoreVec["store_vectors
(Qdrant)"] StoreGraph["store_graph
(Neo4j)"] Start --> Extract --> Classify Classify --> Entities Classify --> Relations Classify --> Embed Classify --> StoreMeta Entities --> Validate Relations --> Validate Embed --> StoreVec Validate --> StoreGraph

LangGraph — Query Graph

graph TD Start(["START (user query)"]) Classify["classify_intent
LLM determines query type"] GraphAgent["graph_query agent
(Cypher → Neo4j)"] VecAgent["vector_search agent
(→ Qdrant)"] SQLAgent["sql_query agent
(SQL → PostgreSQL)"] Merge["merge_results
combine & rank"] Response["generate_response
LLM synthesizes answer
with citations"] Start --> Classify Classify -->|graph query| GraphAgent Classify -->|semantic search| VecAgent Classify -->|structured query| SQLAgent GraphAgent --> Merge VecAgent --> Merge SQLAgent --> Merge Merge --> Response

The conditional edge routes to one, two, or all three agents depending on query intent.

Data Flow Summary

graph TD PDF["PDF Syllabi"] S1["1. Ingest & Extract Text"] S2["2. Classify Syllabus Sections"] S3a["3a. Entity Extraction"] S3b["3b. Relationship Extraction"] S3c["3c. Vector Embedding → Qdrant"] S3d["3d. Metadata → PostgreSQL"] S4["4. Graph Validation & Dedup"] S5["5. Write to Neo4j, Qdrant, PostgreSQL"] S6["6. Query → Route → Retrieve → Respond"] PDF --> S1 --> S2 S2 --> S3a --> S4 S2 --> S3b --> S4 S2 --> S3c --> S5 S2 --> S3d --> S5 S4 --> S5 --> S6

Milestone Alignment

MilestoneDeliverables
M01Variation selection (A), persona, use cases, schema
M02PDF ingestion pipeline, text extraction, section classification, embeddings, metadata to PostgreSQL, Docker Compose
M03Multi-agent extraction pipeline → Neo4j; chat interface; batch query endpoint
M04Evaluation test set (50–100 queries); baseline metrics; error analysis; 3+ improvement ideas
M05Improved system; ablation study (graph vs. vector vs. hybrid); iteration report
M06Deployed system; technical report; demo video; presentation

Evaluation Test Set

Size: 50–100 queries across 6 categories

CategoryMetric
Topic lookupPrecision, Recall, F1
AI policy extractionExact match / LLM-judged
Cross-program comparisonLLM-judged quality (1–5)
Prerequisite reasoningGraph path accuracy
AggregationNumerical accuracy
Temporal trendLLM-judged quality

Reference answers manually authored from 20–30 syllabi reviewed in full.

Containerization

services:
  neo4j:
    image: neo4j:5
    ports: [7474, 7687]
  qdrant:
    image: qdrant/qdrant
    ports: [6333]
  postgres:
    image: postgres:18
    ports: [5432]
  app:
    build: ./app
    depends_on: [neo4j, qdrant, postgres]
    environment:
      - OLLAMA_BASE_URL=http://<ollama-host>:11434
  web-ui:
    build: ./web-ui
    ports: [3000]

Ollama is external — not containerized locally.

All other components run as Docker containers via Docker Compose.

Summary

  • Variation A — Knowledge Graph Intelligence over GSU syllabi
  • Persona — Curriculum Analyst needing cross-program insight
  • Stack — LangGraph + Ollama + Neo4j + Qdrant + PostgreSQL
  • Two LangGraph pipelines — Ingestion and Query with conditional routing
  • Evaluation — 50–100 queries, 6 categories, multiple metrics
  • Deployment — Fully containerized (except external Ollama)
◀ Slides