Leaderboard
Compare model performance across GMP knowledge and task completion benchmarks. Click a model name to view detailed results.
| # | Model | Overall | Knowledge QA | Task Completion | Avg Latency | Total Tokens | # Evals |
|---|---|---|---|---|---|---|---|
| 1 | Claude Haiku 3Anthropic | 90.3% | 100.0% | 41.8% | 3.4s | 5k | 12 |
| 2 | GPT-5-nanoOpenAI | 83.3% | 100.0% | 0.0% | 12.0s | 8k | 6 |