The Age of
benchmarks.
Five frontier models, five numbers, one uncomfortable truth.
Leader · Q2
Claude 4.7
Sonnet, 1M ctx · Anthropic
SWE-bench
77.2%
coding, verified split
GPQA
84.5
diamond, graduate science
Price · input
$3/M
per million tokens, typical
Claude 4.7 Sonnet
77.2
GPT-5 Turbo
74.8
Gemini 3 Pro
71.3
GLM-5
68.9
Kimi k3
66.4