M02 Data Pipeline, CI/CD setup
Due Date: Monday Feb 15, 2026, Wednesday Feb 10, 2026
Students will establish a foundational data pipeline by ingesting a subset of the corpus, extracting and structuring text data, and initially storing it across PostgreSQL, a vector store, and a minimal Neo4j graph, while continuously refining the systems architecture.
- Basic ingestion of PDFs/text (subset of corpus).
- Text extraction, chunking, basic metadata extraction.
- First data written to PostgreSQL, vector store, and (at least) a minimal Neo4j graph.
- Updated architecture diagram.