Install
openclaw skills install modelsenseModelSense — The right model for the right job. Recommends the best LLM model and effort level for any task, based on benchmark data, task analysis, and the...
openclaw skills install modelsenseModelSense helps users pick the optimal model and effort level for their task. It does NOT route automatically on every request (use a provider plugin for that). It's an on-demand advisor: ask it a question, get a clear recommendation with reasoning.
quick / balanced / deep / research
Classify the task across these dimensions:
Cross-reference task domain with relevant benchmarks from data/benchmarks.yaml.
| Benchmark | Best for |
|---|---|
| HumanEval / SWE-bench | Code generation, debugging, engineering |
| GPQA | Graduate-level science & research |
| MATH / AIME | Mathematical reasoning |
| MMLU | General knowledge, multidomain QA |
| Needle-in-Haystack | Long-context retrieval |
| MT-Bench / Arena Elo | Dialogue, writing quality |
| BBH (Big-Bench Hard) | Complex reasoning, multi-step logic |
| Effort | Target quality | Typical model tier |
|---|---|---|
quick | Good enough, fast | Haiku / Flash / GLM |
balanced | High quality, reasonable cost | Sonnet / GPT-4o |
deep | Best available, thinking on | Opus / o3 |
research | No cost limit, maximum quality | Opus + thinking=high |
Check the user's available providers:
openclaw models list via exec tool (or read from context)Format:
🎯 Recommended: <model>
⚡ Effort: <level>
📊 Why: <1-2 sentence benchmark-grounded rationale>
🔧 Special: <thinking on? function calling? etc.>
💰 Cost estimate: <rough $/M or relative>
Alternatives:
- <model B> — if you want faster/cheaper
- <model C> — if you want higher quality
Just output the recommendation. Tell user: "Run /model <name> to switch."
If user confirms or says "yes switch" / "apply it":
session_status(model="<provider/model>")
Notify user: "✅ Switched to X for this session. Run /model default to reset."
If user says "just do it with the best model":
sessions_spawn(
task="<original task>",
model="<recommended model>",
thinking="<level>"
)
data/benchmarks.yaml — benchmark definitions, score leaders, task mappingsdata/models.yaml — model catalog (updated via GitHub Actions weekly)User: "I need to write a Solidity audit report"
→ Domain: code + security + long-form
→ Benchmarks: SWE-bench, HumanEval
→ Recommendation: claude-opus-4-6 with thinking=high, effort=deep
User: "Quick summary of this Slack thread"
→ Domain: dialogue, short
→ Recommendation: claude-haiku-4-5 or gemini-flash, effort=quick
User: "Prove this mathematical conjecture"
→ Domain: math, research-grade
→ Benchmarks: MATH, AIME, GPQA
→ Recommendation: o3 or claude-opus-4-6 with thinking=high, effort=research