AI Models Toplist by Benchmarks
Model Toplist by ReasoningAverage Benchmark(the higher the better)
| Number | Model | ReasoningAverage |
| 1 | Claude 4 Sonnet Thinking | 95.25 |
| 2 | o3 Pro High | 94.67 |
| 3 | o3 High | 94.67 |
| 4 | Gemini 2.5 Pro (Max Thinking) | 94.28 |
| 5 | Gemini 2.5 Pro | 93.72 |
| 6 | DeepSeek R1 (2025-05-28) | 91.08 |
| 7 | o3 Medium | 91 |
| 8 | Claude 4 Opus Thinking | 90.47 |
| 9 | o4-Mini High | 88.11 |
| 10 | Grok 3 Mini Beta (High) | 87.61 |
Model DataAnalysisAverage LeaderBoard(the higher the better)
Model Toplist by MathematicsAverage Benchmark(the higher the better)
| Number | Model | MathematicsAverage |
| 1 | Claude 4 Opus Thinking | 88.25 |
| 2 | DeepSeek R1 (2025-05-28) | 85.26 |
| 3 | Claude 4 Sonnet Thinking | 85.25 |
| 4 | o3 High | 85 |
| 5 | o4-Mini High | 84.9 |
| 6 | o3 Pro High | 84.75 |
| 7 | Gemini 2.5 Pro (Max Thinking) | 84.19 |
| 8 | Gemini 2.5 Flash | 84.1 |
| 9 | Gemini 2.5 Pro | 83.33 |
| 10 | o4-Mini Medium | 81.02 |
Model LanguageAverage LeaderBoard(the higher the better)
Model Toplist by CodingAverage Benchmark(the higher the better)
| Number | Model | CodingAverage |
| 1 | o4-Mini High | 79.98 |
| 2 | Claude 4 Sonnet | 78.25 |
| 3 | o3 Medium | 77.86 |
| 4 | ChatGPT-4o | 77.48 |
| 5 | o3 Pro High | 76.78 |
| 6 | o3 High | 76.71 |
| 7 | DeepSeek R1 | 76.07 |
| 8 | GPT-4.5 Preview | 76.07 |
| 9 | Claude 3.7 Sonnet | 74.28 |
| 10 | o4-Mini Medium | 74.22 |