Benchmark Analysis
GLM-4.7
Coding 能力突破
Open-source model achieves state-of-the-art performance across all major coding benchmarks for the first time.
GLM-4.7 Open Source
Key Finding
在三项核心编程基准测试中,GLM-4.7 均超越 GPT-4o 和 Claude 3.5,成为首个达到 SOTA 水平的开源模型。
Data: Official benchmark evaluations, 2026
Performance Comparison — 03 benchmarks
GLM-4.7
Claude 3.5
GPT-4o
100 80 60 40 AIME 2025 Mathematical Reasoning SWE-bench Software Engineering τ²-Bench Agent Tasks 95.7 73.8% 87.4 Fig. 01 — Tri-axis Performance Map
AIME 2025
Mathematical Reasoning
95.7
+7.5 vs Claude 3.5
Claude 3.5: 88.2 GPT-4o: 83.6
SWE-bench Verified
Software Engineering
73.8%
+20.5 vs Claude 3.5
Claude 3.5: 53.3% GPT-4o: 48.2%
τ²-Bench
Agent Task Completion
87.4
+8.5 vs Claude 3.5
Claude 3.5: 78.9 GPT-4o: 71.5
07 / 24