大模型
benchmarks 之年
Five frontier models, five numbers, one uncomfortable truth.
领跑模型 · leader
Claude 4.7
Sonnet, 1M ctx · Anthropic
SWE-bench
77.2%
coding, verified split
GPQA
84.5
diamond, graduate science
价差 · price
$3/M
input token, typical
Claude 4.7 Sonnet
77.2
GPT-5 Turbo
74.8
Gemini 3 Pro
71.3
GLM-5
68.9
Kimi k3
66.4