Leaderboards

We host results only for Closed-Book methods that have been finetuned on only In-Domain data.

Submissions

To add your system to our leaderboards, please see our submission instructions.

Closed-Book: In-Domain

Rank System Dev Set Test Set Contrastive Set
- Human (ensembled) 99.0 - 99.0
- Human (averaged) 96.3 - 92.2
1 RACo-Large 88.2 88.6 74.4
2 T5-3B 85.6 85.1 70.0
3 RoBERTa-Large 80.6 80.3 61.5
4 RoBERTa-Base 72.2 71.6 56.0