Benchmark Report / 2025 Coding Performance Open-Source SOTA
Open-Source Model
GLM-4.7
95.7
AIME 2025 Score
First open-source model to achieve SOTA across all three major coding benchmarks, surpassing GPT-4o and Claude 3.5.
Open Source
AIME 2025
Mathematical Reasoning
95.7
GLM-4.7
95.7
Claude 3.5
88.2
GPT-4o
83.6
+7.5 vs closed-source best
SWE-bench Verified
Software Engineering
73.8
GLM-4.7
73.8%
Claude 3.5
53.3%
GPT-4o
48.2%
+20.5 vs closed-source best
τ²-Bench
Agent Task Completion
87.4
GLM-4.7
87.4
Claude 3.5
78.9
GPT-4o
71.5
+8.5 vs closed-source best
3/3
Benchmarks Won
#1
Open-Source Ranking
12.2avg
Points Above Runner-Up
Benchmark data sourced from official evaluation reports, 2025
Open-Source SOTA