Benchmark Analysis
GLM-4.7
Coding 能力突破
Open-source model achieves state-of-the-art performance across all major coding benchmarks for the first time.
GLM-4.7 Open Source
Key Finding
在三项核心编程基准测试中,GLM-4.7 均超越 GPT-4o 和 Claude 3.5,成为首个达到 SOTA 水平的开源模型。
Data: Official benchmark evaluations, 2026
Performance Comparison
— 03 benchmarks
GLM-4.7
Claude 3.5
GPT-4o
100
80
60
40
AIME 2025
Mathematical Reasoning
SWE-bench
Software Engineering
τ²-Bench
Agent Tasks
95.7
73.8%
87.4
Fig. 01 — Tri-axis Performance Map
AIME 2025
Mathematical Reasoning
95.7
+7.5 vs Claude 3.5
Claude 3.5:
88.2
GPT-4o:
83.6
SWE-bench Verified
Software Engineering
73.8
%
+20.5 vs Claude 3.5
Claude 3.5:
53.3%
GPT-4o:
48.2%
τ²-Bench
Agent Task Completion
87.4
+8.5 vs Claude 3.5
Claude 3.5:
78.9
GPT-4o:
71.5
07 / 24