AI Models Toplist by Benchmarks
Model Toplist by ReasoningAverage Benchmark(the higher the better)
Number | Model | ReasoningAverage |
1 | Claude 4 Sonnet Thinking | 95.25 |
2 | o3 Pro High | 94.67 |
3 | o3 High | 94.67 |
4 | Gemini 2.5 Pro (Max Thinking) | 94.28 |
5 | Gemini 2.5 Pro | 93.72 |
6 | DeepSeek R1 (2025-05-28) | 91.08 |
7 | o3 Medium | 91 |
8 | Claude 4 Opus Thinking | 90.47 |
9 | o4-Mini High | 88.11 |
10 | Grok 3 Mini Beta (High) | 87.61 |
Model DataAnalysisAverage LeaderBoard(the higher the better)
Model Toplist by MathematicsAverage Benchmark(the higher the better)
Number | Model | MathematicsAverage |
1 | Claude 4 Opus Thinking | 88.25 |
2 | DeepSeek R1 (2025-05-28) | 85.26 |
3 | Claude 4 Sonnet Thinking | 85.25 |
4 | o3 High | 85 |
5 | o4-Mini High | 84.9 |
6 | o3 Pro High | 84.75 |
7 | Gemini 2.5 Pro (Max Thinking) | 84.19 |
8 | Gemini 2.5 Flash | 84.1 |
9 | Gemini 2.5 Pro | 83.33 |
10 | o4-Mini Medium | 81.02 |
Model LanguageAverage LeaderBoard(the higher the better)
Model Toplist by CodingAverage Benchmark(the higher the better)
Number | Model | CodingAverage |
1 | o4-Mini High | 79.98 |
2 | Claude 4 Sonnet | 78.25 |
3 | o3 Medium | 77.86 |
4 | ChatGPT-4o | 77.48 |
5 | o3 Pro High | 76.78 |
6 | o3 High | 76.71 |
7 | DeepSeek R1 | 76.07 |
8 | GPT-4.5 Preview | 76.07 |
9 | Claude 3.7 Sonnet | 74.28 |
10 | o4-Mini Medium | 74.22 |