M04 Evaluation Framework Base-line

Due Date: Monday Apr 5, 2026, Wednesday Mar 31, 2026

Following development, students will rigorously evaluate the system's performance using a completed evaluation test set, running the evaluation pipeline against the batch interface, analyzing results, identifying errors, and proposing at least three specific improvement strategies.

  • Completed evaluation test set (minimum 50–100 items, depending on format).
  • Evaluation pipeline running against your batch interface.
  • Baseline results + error analysis; at least three concrete improvement ideas.