IFQ · DESIGN
V2 · 2026
DATA · benchmarks.json

  
Issue № 05 · AI Benchmarks · Q2 2026
性 能 报 告

大模型
benchmarks 之年

Five frontier models, five numbers, one uncomfortable truth.

领跑模型 · leader
Claude 4.7
Sonnet, 1M ctx · Anthropic
SWE-bench
77.2%
coding, verified split
GPQA
84.5
diamond, graduate science
价差 · price
$3/M
input token, typical
Claude 4.7 Sonnet
77.2
GPT-5 Turbo
74.8
Gemini 3 Pro
71.3
GLM-5
68.9
Kimi k3
66.4
benchmarks
SOURCE SERIF 4 · ITALIC · OLDSTYLE FIGURES
ifq·design
数 据 · 印 刷 级 排 版