M04 Evaluation Framework Base-line
Due Date: Monday Apr 5, 2026, Wednesday Mar 31, 2026
Following development, students will rigorously evaluate the system's performance using a completed evaluation test set, running the evaluation pipeline against the batch interface, analyzing results, identifying errors, and proposing at least three specific improvement strategies.
- Completed evaluation test set (minimum 50–100 items, depending on format).
- Evaluation pipeline running against your batch interface.
- Baseline results + error analysis; at least three concrete improvement ideas.