M02 Data Pipeline, CI/CD setup

Due Date: Monday Feb 15, 2026, Wednesday Feb 10, 2026

Students will establish a foundational data pipeline by ingesting a subset of the corpus, extracting and structuring text data, and initially storing it across PostgreSQL, a vector store, and a minimal Neo4j graph, while continuously refining the systems architecture.

  • Basic ingestion of PDFs/text (subset of corpus).
  • Text extraction, chunking, basic metadata extraction.
  • First data written to PostgreSQL, vector store, and (at least) a minimal Neo4j graph.
  • Updated architecture diagram.