Benchmark Report
/
2025 Coding Performance
Open-Source SOTA
Open-Source Model
GLM-
4.7
95
.
7
AIME 2025 Score
First open-source model to achieve SOTA
across all three major coding benchmarks, surpassing GPT-4o and Claude 3.5.
Open Source
AIME 2025
Mathematical Reasoning
95.7
GLM-4.7
95.7
Claude 3.5
88.2
GPT-4o
83.6
+7.5 vs closed-source best
SWE-bench Verified
Software Engineering
73.8
GLM-4.7
73.8%
Claude 3.5
53.3%
GPT-4o
48.2%
+20.5 vs closed-source best
τ²-Bench
Agent Task Completion
87.4
GLM-4.7
87.4
Claude 3.5
78.9
GPT-4o
71.5
+8.5 vs closed-source best
3
/3
Benchmarks Won
#1
Open-Source Ranking
12
.
2
avg
Points Above Runner-Up
ZHIPU AI
Benchmark data sourced from official evaluation reports, 2025
Open-Source SOTA