๐ Assignment 2 Public Leaderboard
We compute multiple metrics:
- Standard metrics: Answer Recall, F1, and ROUGE-1/2/L (reported as an average)
- LLM-as-judge: rubric-based score (1โ5)
Total score is the uniform mean of the available normalized metrics (0โ1).
Attempts: up to 10. Attempts always increase. Your leaderboard score updates only if your total score improves.
Submission format (JSON):
{
"andrewid": "YOUR_ANDREWID",
"1": "Answer 1",
"2": "Answer 2"
}
Important: Your submission must include answers for ALL questions in the dataset. The number of answers must exactly match the number of questions in the gold dataset.
Please don't refresh or redirect the page during evaluation. It may take some time to finish.