Project Evaluation Rubric

Complete evaluation rubric covering all milestones (M01–M06) for the DAIS semester project. Use this document for self-evaluation and grading.

This document consolidates the evaluation criteria from all project milestones into a single rubric. Each milestone is weighted according to its contribution to the final project grade. Use this rubric for self-evaluation before each submission and as a reference throughout the semester.


AI Evaluation and Feedback Schedule

The table below shows the schedule of the AI evaluation runs. Any changes that are committed or merged to the uat branch before the evaluation process starts will be reviewed.

You are not required to have updates for every evaluation run. Wait until you have meaningful updates before merging to uat.

DateEvaluation Start TimeNumber
Monday, April 13, 20261:00 AM1
Friday, April 17, 20261:00 AM2
Monday, April 20, 20261:00 AM3
Friday, April 24, 20261:00 AM4
Monday, April 27, 20261:00 AM5
Friday, May 1, 20261:00 AM6
Monday, May 4, 20261:00 AMFinal

Grade Allocation

MilestoneTitleWeight
M01Project Definition10%
M02Data Pipeline, CI/CD Setup15%
M03Agentic Prototype20%
M04Evaluation Framework Baseline20%
M05Iterative Improvement20%
M06Final Deliverables15%

M01 — Project Definition (10%)

M01 uses a descriptive rubric with four performance levels per criterion.

#CriterionWeightExcellentGoodSatisfactoryNeeds Improvement
1.1Variation and Corpus Selection40(36–40) Variation (A/B/C) is clearly identified and tightly aligned with a well-justified corpus; the corpus description specifies document types, sources, time span, and approximate scale, and explains why it is appropriate for the chosen variation and business context. Any constraints (access, preprocessing, licensing) are explicitly stated and reasonable.(28–35) Variation and corpus are appropriate and generally well aligned; corpus characteristics are described with minor gaps in detail (for example, incomplete discussion of scope or scale), but the choice is feasible and coherent with the project goals.(20–27) Variation is specified and a corpus is named, but alignment to the variation or to realistic DAIS capabilities is only partially justified; key details about the corpus (types, coverage, or feasibility) are vague or missing.(0–19) Variation is unclear, inconsistent, or missing; the corpus is poorly defined, obviously infeasible, or largely misaligned with the project; justification is minimal or absent.
1.2User Persona and Key Use Cases40(36–40) Persona is realistic and well developed (role, goals, context, decision environment, pain points) and is clearly grounded in the chosen variation and corpus; key use cases are specific, technically plausible, and show how DAIS meaningfully supports the persona’s workflows with nontrivial queries or tasks, going beyond simple keyword search and generic Q&A.(28–35) Persona is plausible and relevant, with a generally clear description of role and goals, though some contextual details or pain points may be underdeveloped; use cases are mostly concrete and aligned with the variation and corpus, but limited in variety or depth or only partially highlight the need for an agentic system.(20–27) Persona is defined but generic or loosely connected to the corpus and variation; use cases are high-level, somewhat repetitive, or close to generic search scenarios; the link between persona, use cases, and DAIS capabilities is only partially evident.(0–19) Persona is missing, unrealistic for the corpus, or misaligned with the chosen variation; use cases are absent, trivial, or too vague to guide design and later evaluation.

M02 — Data Pipeline, CI/CD Setup (15%)

#CriterionDescriptionPoints
2.1Code QualityCode is well-structured, modular, and follows best practices for readability and maintainability.30
2.2Pipeline FunctionalityThe pipeline successfully ingests a subset of the corpus, extracts relevant metadata and text embeddings, and writes this data to the chosen database without errors.30
2.3Architecture DiagramThe architecture diagram is clear, comprehensive, and accurately reflects the components and data flow of the pipeline.30
2.4Documentation & ReproducibilityDocumentation (README.md file) includes clear instructions on how to deploy and run the solution.30

M03 — Agentic Prototype (20%)

#CriterionDescriptionPoints
3.1Multi-Agent PipelineA functional multi-agent pipeline is established that processes documents, orchestrates agent roles, and produces structured text and data. The agent design is appropriate for the chosen project variation.40
3.2Document Ingestion & StorageThe extracted text and structured data produced by the pipeline are ingested and persisted to the appropriate databases in a queryable form.40
3.3Dual Interface ImplementationBoth a chat interface for human interaction and a batch query interface for automated evaluation are functional and accessible. The interfaces correctly route queries through the agent pipeline and return meaningful responses.40
3.4Architecture & ReproducibilityThe system architecture is documented (diagram or written description), the repository is well-organized, and the application can be deployed and run from the provided instructions without manual intervention.40

M04 — Evaluation Framework Baseline (20%)

#CriterionDescriptionPoints
4.1Evaluation Test Set ExecutionThe completed evaluation test set is run against the batch interface, producing a full set of system outputs. Results are systematically collected, organized, and stored for analysis.40
4.2Quantitative Performance AnalysisSystem outputs are evaluated against expected results using defined metrics (e.g., accuracy, relevance, completeness). Results are presented clearly with summary statistics and per-query breakdowns where appropriate.40
4.3Error Analysis & Failure IdentificationErrors and low-performing cases are identified, categorized, and analyzed. The analysis goes beyond listing failures to explaining likely root causes (e.g., retrieval gaps, prompt failures, schema mismatches).40
4.4Improvement Strategy ProposalsAt least three specific, actionable improvement strategies are proposed, grounded in the error analysis. Each strategy identifies what will be changed, why it is expected to help, and how its impact will be measured in M05.40

M05 — Iterative Improvement (20%)

#CriterionDescriptionPoints
5.1System Refinements ImplementationArchitectural or agent-level modifications informed by M04 findings are implemented and functional. Changes are clearly linked to the improvement strategies proposed in M04.40
5.2Ablation StudyA structured ablation study compares at least two alternative approaches (e.g., different retrieval strategies, agent configurations, or prompt designs), with results measured against the M04 baseline using the same evaluation pipeline.40
5.3Comparative Results & Impact AssessmentRe-evaluation results are presented alongside M04 baseline metrics in a structured comparison. The analysis interprets the magnitude and significance of improvements and notes any regressions or trade-offs.40
5.4Iteration ReportA concise iteration report demonstrates how the performance of the DAIS has improved based on the AI evaluation metrics. The report documents what was changed, the rationale, and the measured impact on evaluation results.40

M06 — Final Deliverables (15%)

#CriterionDescriptionPoints
6.1Deployed DAIS SystemA fully functional DAIS system is deployed and accessible through both the chat and batch interfaces. The system reflects all improvements from prior milestones and is stable enough for live demonstration.40
6.2Technical ReportA comprehensive written report covering the full project lifecycle — problem definition, system design, data pipeline, evaluation methodology, results, and conclusions — is well-structured, clearly written, and accurately reflects the system as built.40
6.3Demo Video & In-Class PresentationA recorded demo video showcases the system handling representative queries, and the live in-class presentation communicates the design rationale, evaluation findings, and key lessons. The team handles Q&A with depth and clarity.40

Self-Evaluation Checklist

Use this checklist before each milestone submission to verify completeness.

M01 — Project Definition

  • Variation (A, B, or C) is clearly identified with justification
  • Corpus is described with document types, sources, scale, and feasibility
  • User persona includes role, goals, context, and pain points
  • Key use cases are specific, nontrivial, and aligned with the variation
  • Document uploaded to iCollege as PDF

M02 — Data Pipeline, CI/CD Setup

  • Pipeline ingests documents from the corpus subset
  • Text extraction, chunking, and metadata extraction are functional
  • Vector embeddings are generated and stored in the database
  • Architecture diagram reflects the current pipeline design
  • README.md includes deployment and run instructions
  • M02_MILESTONE.md is committed with notes on each requirement
  • Merge request created from working branch to uat

M03 — Agentic Prototype

  • Multi-agent pipeline processes documents and stores structured data
  • Chat interface is functional and routes queries through the agent pipeline
  • Batch query interface accepts a file of questions and stores responses
  • Architecture is documented (diagram or written description)
  • Application can be deployed and run from provided instructions
  • M03_MILESTONE.md is committed with descriptions and run instructions
  • Merge request created from working branch to uat

M04 — Evaluation Framework Baseline

  • Evaluation test set contains 50–100 items
  • Test set has been run against the batch interface with outputs collected
  • Metrics are defined and applied (e.g., accuracy, relevance, completeness)
  • Summary statistics and per-query breakdowns are presented
  • Errors are categorized with root cause analysis
  • At least three improvement strategies are proposed with rationale
  • M04_MILESTONE.md is committed
  • Merge request created from working branch to uat

M05 — Iterative Improvement

  • System modifications are implemented and linked to M04 improvement strategies
  • Ablation study compares at least two alternative approaches
  • Results are measured against M04 baseline using the same evaluation pipeline
  • Comparative analysis includes magnitude of improvements, regressions, and trade-offs
  • Iteration report is concise, well-organized, and stands alone as an artifact
  • M05_MILESTONE.md is committed
  • Merge request created from working branch to uat

M06 — Final Deliverables

  • DAIS system is deployed and accessible via chat and batch interfaces
  • System is stable enough for live demonstration
  • Technical report (10–15 pages) covers the full project lifecycle
  • Demo video (5–10 minutes) showcases representative queries
  • In-class presentation slides are prepared
  • Final code is committed and merged into the uat branch
  • Technical report, demo video, and presentation uploaded to iCollege
## Scores
#CriterionPoints
M01 — Project Definition
1.1Variation and Corpus Selection40
1.2User Persona and Key Use Cases40
M01 Subtotal80
M02 — Data Pipeline, CI/CD Setup
2.1Code Quality30
2.2Pipeline Functionality30
2.3Architecture Diagram30
2.4Documentation & Reproducibility30
M02 Subtotal120
M03 — Agentic Prototype
3.1Multi-Agent Pipeline40
3.2Document Ingestion & Storage40
3.3Dual Interface Implementation40
3.4Architecture & Reproducibility40
M03 Subtotal160
M04 — Evaluation Framework Baseline
4.1Evaluation Test Set Execution40
4.2Quantitative Performance Analysis40
4.3Error Analysis & Failure Identification40
4.4Improvement Strategy Proposals40
M04 Subtotal160
M05 — Iterative Improvement
5.1System Refinements Implementation40
5.2Ablation Study40
5.3Comparative Results & Impact Assessment40
5.4Iteration Report40
M05 Subtotal160
M06 — Final Deliverables
6.1Deployed DAIS System40
6.2Technical Report40
6.3Demo Video & In-Class Presentation40
M06 Subtotal120
Grand Total800