{"skill":{"slug":"ai-leaderboard","displayName":"AI Leaderboard","summary":"Comprehensive AI leaderboard for LLM models and AI applications. Query model rankings, model IDs, and pricing from OpenRouter, Artificial Analysis, and Pinch...","description":"---\nname: AI Rankings Leaderboard\ndisplay_name: AI Rankings Leaderboard / AI 排行榜\ndescription: Comprehensive AI leaderboard for LLM models and AI applications. Query model rankings, model IDs, and pricing from OpenRouter, Artificial Analysis, and Pinchbench. Trigger words include \"AI rankings\", \"LLM leaderboard\", \"model comparison\", \"AI apps ranking\", \"best AI models\", \"model benchmark\", \"free models\", \"免费模型\", \"OpenRouter model ID\", \"OpenRouter 模型\", \"Artificial Analysis\", \"artificial analysis\", \"AI 智力指数\", \"intelligence index\", \"coding index\", \"coding排行榜\", \"agentic index\", \"agentic排行榜\", \"模型速度排行\", \"模型价格对比\", \"model ID for\", \"OpenRouter model parameter\".\nversion: 1.20.1\ncli_dependencies:\n  - agent-browser\n---\n\n# AI Rankings Leaderboard Skill\n\n## Description\n\nA comprehensive skill for querying AI model and application rankings from multiple authoritative sources. Get the latest insights on LLM performance, popularity, pricing, and value metrics.\n\n## Data Sources\n\n| Source | URL | Focus |\n|--------|-----|-------|\n| **Artificial Analysis** | https://artificialanalysis.ai/ | Intelligence Index, Speed, Price benchmarks |\n| LLM Leaderboard | https://artificialanalysis.ai/leaderboards/models | Model comparison (100+ models) |\n| LLM API Providers | https://artificialanalysis.ai/leaderboards/providers | API Provider comparison (500+ endpoints) |\n| Image & Video Leaderboards | https://artificialanalysis.ai/ (Image & Video section) | Image/Video model ELO rankings |\n| OpenRouter Rankings | https://openrouter.ai/rankings | Model usage & popularity |\n| OpenRouter Apps | https://openrouter.ai/apps | AI applications ranking |\n| OpenRouter Models | https://openrouter.ai/models | All available models with pricing |\n| OpenRouter Free Models | https://openrouter.ai/models?q=free | Free models only |\n| Pinchbench | https://pinchbench.com/ | Model benchmark (Success Rate, Speed, Cost, Value) |\n\n## Features\n\n### 1. Artificial Analysis LLM Leaderboard\n\n**Intelligence Index (智力指数)**\n- **Artificial Analysis Intelligence Index v4.0**: Comprehensive model intelligence score\n- **10 evaluation dimensions**: Multiple independent assessment criteria\n- **Frontier Models**: Top intelligence models (Gemini 3.1 Pro, GPT-5.4, Claude Opus 4.6, etc.)\n- **Reasoning Models**: Identifies models with reasoning capabilities\n\n**Artificial Analysis Coding Index** (编程能力指数)\n- URL: https://artificialanalysis.ai/?intelligence=coding-index\n- 评估模型在编程任务上的表现\n- 综合多个代码评测基准\n\n**Artificial Analysis Agentic Index** (智能体能力指数)\n- URL: https://artificialanalysis.ai/?intelligence=agentic-index\n- 评估模型的自主智能体能力\n- 包括工具使用、多步骤推理、任务完成等\n\n**Performance Metrics**\n| Metric | Description |\n|--------|-------------|\n| Intelligence Index | Overall model intelligence score (higher is better) |\n| Speed | Output tokens per second (tokens/s) |\n| Blended Price | Combined USD per million tokens (3:1 input/output ratio) |\n| Input Price | Price per million input tokens (USD) |\n| Output Price | Price per million output tokens (USD) |\n| Latency (TTFT) | Time to First Token in seconds |\n| Context Window | Maximum context length supported |\n\n**Model Comparison Table Columns**\n| Column | Description |\n|--------|-------------|\n| Features | Model features (reasoning badge, etc.) |\n| Model | Model name with logo |\n| Context Window | Max context length |\n| Creator | Provider/Company |\n| Intelligence Index | AI intelligence score |\n| Blended USD/1M Tokens | Combined input/output price |\n| Median Tokens/s | Median output speed |\n| Latency First Chunk (s) | Time to first token |\n| Further Analysis | Link to detailed analysis |\n\n**Filters Available**\n| Filter | Options |\n|--------|---------|\n| Frontier Models | On/Off |\n| Open Weights | On/Off (开源权重模型) |\n| Size Class | Small, Medium, Large, etc. |\n| Reasoning | On/Off (推理模型筛选) |\n| Model Status | Current, Preview, Discontinued |\n\n### 2. Artificial Analysis LLM API Providers Leaderboard\n\n**Comparison of 500+ AI Model Endpoints**\n\n| Column | Description |\n|--------|-------------|\n| API Provider | Provider name (Cerebras, Groq, Fireworks, etc.) |\n| Model | Model name |\n| Context Window | Max context length |\n| License | Model license |\n| Intelligence Index | Model intelligence score |\n| Blended USD/1M Tokens | Combined price |\n| Median Tokens/s | Output speed |\n| Median First Chunk (s) | Latency (TTFT) |\n| Total Response (s) | End-to-end response time |\n| Reasoning Time (s) | Reasoning model computation time |\n| End-to-End Response Time | Full request-response cycle |\n\n**Key Providers**\n- Cerebras\n- Eigen AI\n- Fireworks\n- SambaNova\n- Together.ai\n- Hyperbolic\n- Nebius Fast\n- Google Vertex\n- Groq\n- Azure OpenAI\n- AWS Bedrock\n- OpenAI Direct\n- Anthropic Direct\n- And 10+ more...\n\n### 3. Artificial Analysis Image & Video Leaderboards\n\n**Text-to-Image Leaderboard**\n- ELO scores from blind preference votes\n- 95% confidence intervals displayed\n- Top models: GPT Image 1.5, Imagen 4 Ultra, Gemini Image models, etc.\n\n**Video Leaderboards**\n| Category | Description |\n|----------|-------------|\n| Text to Video (with Audio) | Text generates video with sound |\n| Text to Video (without Audio) | Text generates silent video |\n| Image to Video (with Audio) | Image + text generates video with sound |\n| Image to Video (without Audio) | Image + text generates silent video |\n| Image Editing | Edit existing images with AI |\n\n**Evaluation Method**\n- ELO scoring system (blind preference voting)\n- 95% confidence intervals\n- Real user preference data\n\n### 4. OpenRouter Model Rankings\n- **LLM Leaderboard**: Overall model usage rankings\n- **Market Share**: Market share by model provider\n- **Categories**: Rankings by use case\n- **Languages**: Natural language support rankings\n- **Programming**: Programming language support\n- **Context Length**: Long context handling\n- **Tool Calls**: Tool calling capabilities\n- **Images**: Image processing volume\n\n### 5. OpenRouter App Rankings\n- **Most Popular**: Top apps by token usage\n- **Trending**: Fastest growing apps this week\n- **Categories**: Coding Agents, Productivity, Creative, Entertainment\n\n### 6. OpenRouter Model Catalog\n- **All Models**: Complete list of available models on OpenRouter\n- **Free Models**: Models with $0 pricing (free to use)\n- **Model ID**: The exact `model` parameter to use when calling OpenRouter API\n- **Pricing Info**: Input/output token pricing\n\n### 7. Pinchbench Benchmarks\n- **Success Rate**: Task completion success percentage\n- **Speed**: Response time performance\n- **Cost**: Cost per run analysis\n- **Value**: Price-performance ratio\n\n## Trigger Keywords\n\n### General AI Rankings\n- \"AI rankings\" / \"AI 排行榜\"\n- \"LLM leaderboard\" / \"LLM 排行\"\n- \"model comparison\" / \"模型对比\"\n- \"best AI models\" / \"最好的 AI 模型\"\n- \"AI apps ranking\" / \"AI 应用排行\"\n- \"model benchmark\" / \"模型评测\"\n\n### Artificial Analysis Specific\n- \"Artificial Analysis\" / \"artificialanalysis\"\n- \"AI intelligence index\" / \"AI 智力指数\"\n- \"intelligence index\" / \"智力指数\"\n- \"模型速度排行\" / \"speed ranking\"\n- \"模型价格对比\" / \"price comparison\"\n- \"fastest models\" / \"最快模型\"\n- \"cheapest models\" / \"最便宜模型\"\n- \"tokens per second\" / \"t/s\" / \"tokens/s\"\n- \"latency\" / \"TTFT\" / \"首 token 延迟\"\n- \"Artificial Analysis Intelligence Index\"\n- \"AAII\" / \"AA Intelligence\"\n- \"API providers\" / \"API 提供商\"\n- \"LLM providers\" / \"LLM 提供商\"\n- \"Cerebras\" / \"Groq\" / \"Fireworks\"\n- \"open weights\" / \"开源权重\"\n- \"reasoning models\" / \"推理模型\"\n- \"elo score\" / \"ELO 评分\"\n- \"image arena\" / \"图生图\"\n- \"text to image\" / \"文生图\"\n- \"text to video\" / \"文生视频\"\n- \"image to video\" / \"图生视频\"\n\n### OpenRouter Specific\n- \"free models\" / \"免费模型\" / \"free AI models\"\n- \"OpenRouter models\" / \"OpenRouter 免费模型\"\n- \"OpenRouter rankings\" / \"OpenRouter 排行\"\n- \"Pinchbench\"\n- \"OpenRouter model ID\" / \"OpenRouter 模型 ID\"\n- \"查找 OpenRouter\" / \"OpenRouter 上的模型\"\n- \"model ID for [模型名]\" / \"[模型名] model ID\"\n- \"OpenRouter 上 [模型名]\" / \"OpenRouter [模型名] 模型\"\n- \"OpenRouter model parameter\"\n- \"调用量排行\" / \"使用量排行\" / \"top models\" / \"top 模型\"\n- \"OpenRouter 调用量\" / \"OpenRouter 使用量\"\n\n## Runtime Tools\n\nThis skill requires:\n- `execute_command`: Execute shell commands and scripts\n- `use_skill`: Load browser-automation skill for JavaScript-rendered pages\n- `web_fetch`: Fallback for simple HTTP requests\n\n## Installation\n\n**Required CLI Dependency**: `agent-browser`\n\nThe `agent-browser` CLI must be installed before using this skill. Install via:\n\n```bash\nnpm install -g agent-browser\n# or\nnpx agent-browser --version\n```\n\nThis skill calls `agent-browser` via subprocess with hardcoded argument arrays (no shell injection risk).\n\n**Note on browser eval**: The `agent-browser eval` command executes `document.body.innerText` or similar DOM queries on the remote page to extract rendered content. This is standard web scraping behavior for JavaScript-rendered pages and is limited to reading page content only.\n\n## Browser Automation Support\n\nFor JavaScript-rendered pages (OpenRouter Rankings, Artificial Analysis), this skill uses browser automation:\n\n1. **Load browser-automation skill first**:\n   ```\n   use_skill(\"browser-automation\")\n   ```\n\n2. **Navigate to rankings page**:\n   ```bash\n   agent-browser open \"https://artificialanalysis.ai/leaderboards/models\"\n   agent-browser wait --load networkidle\n   agent-browser eval \"document.body.innerText\"\n   ```\n\n3. **Key pages requiring browser**:\n   - `https://artificialanalysis.ai/leaderboards/models` - LLM comparison (100+ models)\n   - `https://artificialanalysis.ai/leaderboards/providers` - API providers (500+ endpoints)\n   - `https://artificialanalysis.ai/` - Image & Video leaderboards\n   - `https://openrouter.ai/rankings` - Model usage rankings (JS rendered)\n   - `https://openrouter.ai/apps` - App rankings (JS rendered)\n\n### Artificial Analysis Page Structure\n\n**LLM Leaderboard Page** (`/leaderboards/models`):\n```\nLLM Leaderboard - Comparison of over 100 AI models\n├── HIGHLIGHTS section\n│   ├── Intelligence: Gemini 3.1 Pro Preview, GPT-5.4 (xhigh)\n│   ├── Speed: Mercury 2 (943 t/s), NVIDIA Nemotron 3 Super (462 t/s)\n│   └── Price: Gemma 3n E4B (cheapest)\n├── Filters:\n│   ├── Frontier Models | Open Weights | Size Class | Reasoning | Model Status\n├── Comparison table columns:\n│   ├── Features | Model | Context Window | Creator\n│   ├── Intelligence Index | Blended USD/1M | Median Tokens/s | Latency\n│   └── Further Analysis\n└── Key definitions (expandable)\n    ├── Context window\n    ├── Output Speed (tokens/s)\n    ├── Latency (Time to First Token)\n    ├── Price (3:1 blended)\n    ├── Output Price\n    └── Input Price\n```\n\n**LLM API Providers Page** (`/leaderboards/providers`):\n```\nLLM API Providers Leaderboard - 500+ endpoints\n├── Filters (same as LLM Leaderboard)\n├── Comparison table columns:\n│   ├── API Provider | Model | Context Window | License\n│   ├── Intelligence Index | Blended USD/1M | Median Tokens/s\n│   ├── Median First Chunk (s) | Total Response (s) | Reasoning Time (s)\n│   └── Further Analysis\n└── 24+ Providers: Cerebras, Groq, Fireworks, SambaNova, etc.\n```\n\n**Image & Video Leaderboards** (on homepage):\n```\nImage & Video Leaderboards\n├── Tabs:\n│   ├── Text to Image (ELO scores, 95% CI)\n│   ├── Image Editing\n│   ├── Text to Video (with Audio)\n│   ├── Text to Video (without Audio)\n│   ├── Image to Video (with Audio)\n│   └── Image to Video (without Audio)\n└── Top models with ELO rankings\n```\n\n### OpenRouter Page Structure (Reminder)\n\n**OpenRouter Rankings Page** (`/rankings`):\n```\nhttps://openrouter.ai/rankings\n├── Top Models (chart header)\n├── LLM Leaderboard ← THIS is the usage ranking (parse this!)\n│   ├── 1. MiniMax M2.5 (1.75T tokens)\n│   ├── 2. Step 3.5 Flash (1.34T tokens)\n│   └── [Show more] button\n├── Market Share (different metric - don't mix!)\n└── ...\n```\n\n## Usage Examples\n\n### Query Artificial Analysis Intelligence Index\n```\nUser: \"What are the top models on Artificial Analysis Intelligence Index?\"\n-> Fetches Artificial Analysis LLM Leaderboard and displays top models by intelligence\n```\n\n### Query Model Speed Rankings\n```\nUser: \"Which AI models are the fastest in terms of output speed?\"\n-> Fetches Artificial Analysis data and lists models by tokens/second\n```\n\n### Query API Providers\n```\nUser: \"Compare LLM API providers like Cerebras and Groq\"\n-> Fetches Artificial Analysis Providers Leaderboard and compares speed/price\n```\n\n### Query Image/Video Models\n```\nUser: \"What are the best text-to-image models?\"\n-> Fetches Artificial Analysis Image Arena leaderboard with ELO scores\n```\n\n### Query Model Rankings (OpenRouter)\n```\nUser: \"What are the top 10 AI models right now?\"\n-> Fetches OpenRouter rankings and displays top models with usage stats\n```\n\n### Query Free Models\n```\nUser: \"What free models are available on OpenRouter?\"\n-> Fetches https://openrouter.ai/models?q=free and lists all free models with their model IDs\n```\n\n### Get Model ID for API Calls\n```\nUser: \"What's the model ID for GPT-4o on OpenRouter?\"\n-> Fetches https://openrouter.ai/models and returns the exact model parameter to use\n```\n\n### Compare Model Performance\n```\nUser: \"Compare GPT-4 and Claude on Pinchbench\"\n-> Fetches Pinchbench data and compares success rate, speed, cost\n```\n\n## Output Format\n\n### Artificial Analysis Intelligence Index\n```\n==================================================\n    Artificial Analysis Intelligence Index\n==================================================\n\nTop 10 Models by Intelligence:\n\n| Rank | Model | Intelligence | Speed (t/s) | Price ($/M) |\n|------|-------|--------------|-------------|-------------|\n| 1 | Gemini 3.1 Pro Preview | 57 | ~50 | $1.25 |\n| 2 | GPT-5.4 (xhigh) | 57 | ~60 | $15.00 |\n| 3 | Claude Opus 4.6 (max) | 53 | ~80 | $18.00 |\n| 4 | Claude Sonnet 4.6 (max) | 52 | ~85 | $4.50 |\n| 5 | GLM-5 | 50 | ~45 | $0.50 |\n...\n\nFastest Models: Mercury 2 (943 t/s), NVIDIA Nemotron 3 Super (462 t/s)\nBest Price: Gemma 3n E4B, Granite 4.0 H Small\n\nData Source: Artificial Analysis (artificialanalysis.ai)\n==================================================\n```\n\n### API Providers Comparison\n```\n==================================================\n    LLM API Providers Leaderboard\n==================================================\n\n| Provider | Model | Speed (t/s) | Price ($/M) | Latency (s) |\n|----------|-------|-------------|-------------|-------------|\n| Cerebras | Llama 3.1 70B | 2143 | $0.12 | 0.08 |\n| Groq | Llama 3.1 70B | 943 | $0.59 | 0.15 |\n| Fireworks | Llama 3.1 70B | 562 | $0.90 | 0.22 |\n...\n\nData Source: Artificial Analysis Providers\n==================================================\n```\n\n### Image Arena (ELO Rankings)\n```\n==================================================\n    Text-to-Image Leaderboard (ELO)\n==================================================\n\n| Rank | Model | ELO Score | 95% CI |\n|------|-------|-----------|--------|\n| 1 | GPT Image 1.5 (high) | 1342 | ±12 |\n| 2 | Imagen 4 Ultra | 1289 | ±15 |\n| 3 | Gemini 3.1 Flash Image | 1245 | ±18 |\n...\n\nData Source: Artificial Analysis Image Arena\n==================================================\n```\n\n### OpenRouter Model Rankings\n```\n==================================================\n    AI Model Rankings (OpenRouter)\n==================================================\n\nTop 10 Models by Usage:\n\n| Rank | Model | Provider | Tokens | Growth |\n|------|-------|----------|--------|--------|\n| 1 | MiniMax M2.5 | minimax | 1.75T | +15% |\n| 2 | Step 3.5 Flash | step | 1.34T | +22% |\n...\n\nData Source: OpenRouter (Weekly Rankings)\n==================================================\n```\n\n### Free Models List\n```\n==================================================\n    Free Models on OpenRouter\n==================================================\n\n| Model Name | Model ID (for API) | Context |\n|------------|-------------------|---------|\n| GPT-4o Mini | openai/gpt-4o-mini | 128K |\n| Llama 3.3 70B | meta-llama/llama-3.3-70b-instruct | 128K |\n| DeepSeek V3 | deepseek/deepseek-chat | 64K |\n...\n\n💡 Usage: Set model parameter to the Model ID value\n   Example: model=\"openai/gpt-4o-mini\"\n\nData Source: OpenRouter Models\n==================================================\n```\n\n## Execution Instructions\n\n### Method 1: Browser Automation for Rankings (Recommended)\n\nArtificial Analysis and OpenRouter rankings pages require JavaScript rendering:\n\n```bash\n# Step 1: Load browser-automation skill (REQUIRED)\nuse_skill(\"browser-automation\")\n\n# Step 2: Navigate to Artificial Analysis LLM Leaderboard\nagent-browser open \"https://artificialanalysis.ai/leaderboards/models\"\nagent-browser wait --load networkidle\n\n# Step 3: Wait for content to load, then extract\nagent-browser wait 3000\nagent-browser eval \"document.body.innerText\"\n\n# Step 4: Close browser when done\nagent-browser close\n```\n\n### Method 2: Python Script for OpenRouter Model Catalog\n\nUse the `query_leaderboard.py` script to fetch model data via OpenRouter API (no JavaScript needed):\n\n```bash\n# List free models\npython3 \"${SKILL_DIR}/query_leaderboard.py --free\"\n\n# Search models by name\npython3 \"${SKILL_DIR}/query_leaderboard.py -s glm\"\npython3 \"${SKILL_DIR}/query_leaderboard.py -s gpt\"\n\n# Get specific model info\npython3 \"${SKILL_DIR}/query_leaderboard.py --id openai/gpt-4o\"\n\n# List all models with limit\npython3 \"${SKILL_DIR}/query_leaderboard.py --all --limit 50\"\n```\n\n### Method 3: Web Fetch (Fallback)\n\nWhen browser/Python is not available, use `web_fetch`:\n\n1. **For Artificial Analysis**: Fetch `https://artificialanalysis.ai/leaderboards/models`\n2. **For OpenRouter model catalog**: Use OpenRouter API `https://openrouter.ai/api/v1/models`\n3. **For benchmarks**: Fetch `https://pinchbench.com/`\n\n**Note**: Rankings pages require JavaScript rendering - use browser automation (Method 1).\n\n## Notes\n\n- Data is updated regularly (Artificial Analysis, OpenRouter weekly, Pinchbench near real-time)\n- Artificial Analysis Intelligence Index is based on 10 independent evaluations\n- ELO scores are from blind preference voting with 95% confidence intervals\n- Pinchbench disclaimer: \"For entertainment purposes only, should not be relied upon for critical decisions\"\n- Rankings reflect actual usage data from millions of users\n- Free models have $0.00 pricing on OpenRouter\n- **Model ID format**: Use the exact string (e.g., `openai/gpt-4o-mini`) as the `model` parameter in API calls\n\n## Artificial Analysis API Patterns\n\nBased on observed page structure, Artificial Analysis provides:\n- **Model comparison data**: https://artificialanalysis.ai/leaderboards/models\n- **Provider comparison**: https://artificialanalysis.ai/leaderboards/providers\n- **Image/Video arenas**: Embedded on homepage with tab navigation\n- **Model-specific provider data**: `/models/{model-id}/providers` endpoint pattern\n\n**Example model providers API**:\n```\n/models/gpt-oss-120b/providers\n/models/gemini-3-1-pro-preview/providers\n/models/claude-opus-4-6-adaptive/providers\n```\n\n## OpenRouter API Usage\n\nWhen calling OpenRouter API (for chat completions), use the Model ID. Note: This skill's scripts (fetch_rankings.py, query_leaderboard.py) only read public leaderboard data and do NOT require API authentication.\n\n```bash\ncurl https://openrouter.ai/api/v1/chat/completions \\\n  -H \"Authorization: Bearer $OPENROUTER_API_KEY\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"openai/gpt-4o-mini\",  # <- Model ID from this skill\n    \"messages\": [{\"role\": \"user\", \"content\": \"Hello\"}]\n  }'\n```\n\n","topics":["Agentic","Coding"],"tags":{"latest":"1.20.1"},"stats":{"comments":0,"downloads":640,"installsAllTime":24,"installsCurrent":0,"stars":0,"versions":10},"createdAt":1773652627680,"updatedAt":1778491943830},"latestVersion":{"version":"1.20.1","createdAt":1773899776856,"changelog":"- Added agent-browser as a CLI dependency in SKILL.md for improved capabilities.\n- Updated version to 1.20.1 in metadata.\n- No changes to features, data sources, or usage; documentation now specifies the required dependency.","license":"MIT-0"},"metadata":null,"owner":{"handle":"luduoxin","userId":"s17cnpmrsw54h2vbr823362y3183hxgt","displayName":"路多辛","image":"https://avatars.githubusercontent.com/u/3434402?v=4"},"moderation":null}