A LLM router skill for OpenClaw

v0.1.1

LangGraph-based intelligent task router that splits work between PRO (heavy reasoning) and FLASH (fast) models using 5-dimension complexity scoring, configur...

1· 38·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for fanyadan/super-router.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "A LLM router skill for OpenClaw" (fanyadan/super-router) from ClawHub.
Skill page: https://clawhub.ai/fanyadan/super-router
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required binaries: python
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install super-router

ClawHub CLI

Package manager switcher

npx clawhub@latest install super-router
Security Scan
Capability signals
CryptoCan make purchasesRequires sensitive credentials
These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.
VirusTotalVirusTotal
Pending
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (model routing + task decomposition) align with the included code and instructions: the repository contains a LangGraph-based router, tests, and README that document using local Ollama or Gemini CLI. Required runtime pieces (python, langgraph, local Ollama or gemini CLI) are appropriate for the declared functionality. There are no unexpected cloud or platform credentials demanded by the skill.
Instruction Scope
SKILL.md and README instruct the agent/user to run the included Python script and to install langgraph and optionally pull models from Ollama or use Gemini CLI. The runtime instructions reference environment variables (ROUTER_*) and local paths (~/.openclaw/skills/...), but they do not instruct arbitrary file-system scanning, reading unrelated secrets, or exfiltration. Note: examples include hard-coded exec paths and recommend pulling large models and network access for Gemini CLI — these are operational concerns rather than misaligned behavior.
Install Mechanism
There is no formal install spec in the registry (instruction-only), but the package contains runnable code (scripts/router.py). The documented install steps are manual (pip install langgraph; run/serve Ollama; pull models). This is reasonable for a local Python tool, but because the code will run locally when invoked, users should inspect the script before executing it. No high-risk remote download/install URLs or archive extraction steps are present in the skill metadata.
Credentials
The skill does not require any credentials in the registry metadata. The implementation optionally respects ROUTER_* env vars (model names, endpoints, debug flags) and allows GEMINI_API_KEY / Google application creds in a tightly controlled allowlist for Gemini CLI calls. It does not request unrelated secrets (AWS keys, SSH keys, etc.). The code intentionally refuses non-local Ollama URLs unless explicitly enabled (ROUTER_ALLOW_REMOTE_OLLAMA=1), which is a proportional safety control.
Persistence & Privilege
The skill does not declare always:true and has no install hooks or background persistence in the registry metadata. It runs on demand via the included script. The default ability for the agent to invoke skills autonomously is unchanged (disable-model-invocation is false), which is normal — no extra persistent privileges are requested by the skill itself.
Assessment
This skill appears coherent with its stated purpose, but it executes local Python code and can invoke local model servers or the Gemini CLI. Before installing or running: 1) inspect scripts/router.py yourself (or review with a trusted reviewer) because the repository contains executable code; 2) prefer local Ollama (default) for sensitive data — do not set ROUTER_ALLOW_REMOTE_OLLAMA unless you trust the remote endpoint; 3) only provide GEMINI_API_KEY or Google credentials if you intend to use Gemini CLI and understand that provider outputs may leave your machine; 4) ensure the gemini executable you point to is legitimate (the code enforces it be named `gemini`); and 5) run the tool in an isolated environment (container or VM) if you are unsure about model binaries or network interactions.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

Binspython
latestvk9703xqx87js53se736qbwzg1585prm3openclawvk9703xqx87js53se736qbwzg1585prm3
38downloads
1stars
2versions
Updated 10h ago
v0.1.1
MIT-0

Super Router (LangGraph Edition)

Intelligent task decomposition and model routing using LangGraph StateGraph. Automatically routes subtasks between PRO (heavy reasoning) and FLASH (fast) models based on structured complexity assessment.

When to Use This Skill

Use super-router when you need:

  • Intelligent model routing — automatically choose between heavy (PRO) and fast (FLASH) models per subtask
  • Task decomposition — break complex tasks into structured subtasks with independent routing
  • Cost optimization — use fast models for simple work, heavy models only when needed
  • Configurable models — use deterministic defaults, with environment-variable overrides for each role
  • Failure escalation — FLASH retry on infra failures, escalate to PRO on capability failures
  • Audit trail — full logging of planned vs actual routes, retries, and failure classifications

Not needed for: Simple single-turn tasks, tasks where you already know which model to use, or when you want manual control over every routing decision.

Core Architecture (LangGraph StateGraph)

NodeFunction
PlannerReceives original task, calls local Ollama planner model to generate ordered subtask array
JudgeScores each subtask on 5 dimensions: reasoning_depth, code_change_scope, ambiguity, risk, io_heaviness; combines with thresholds + confidence to decide PRO/FLASH
DispatcherReads RouterState.current_step, routes via conditional edge to pro_executor or flash_executor
PRO ExecutorHeavy reasoning model (default: Gemini CLI preview model; override via ROUTER_PRO_MODEL)
FLASH ExecutorFast model with review/retry logic (default: Gemini CLI preview model; override via ROUTER_FLASH_MODEL)
FLASH ReviewValidates output quality; distinguishes infra failures (timeout, network) from capability failures; retries FLASH or escalates to PRO
Metadata ExtractorExtracts 'Technical Gold' (atomic high-precision facts) from step output to prevent finalizer timeouts and loss of detail
Recorder/FinalizerLogs every step; compiles final report using a hybrid of Technical Gold and full audit trails; supports FLASH→PRO→deterministic fallback chain

Installation

# Required: LangGraph + Ollama
pip install langgraph

# Ensure Ollama is running locally
ollama serve

# Pull recommended models if you use Ollama-backed roles
ollama pull gemma4:26b     # Planner or PRO executor (high quality, slow)
ollama pull llama3.1:8b    # Judge (fast scoring, recommended)
ollama pull qwen3         # PRO executor
ollama pull qwen2.5:7b    # FLASH executor

Note: If you prefer gemma4:26b as the Planner, keep it there. For speed, the Judge should usually be llama3.1:8b or another 7B-14B model:

export ROUTER_PLANNER_MODEL=gemma4:26b
export ROUTER_JUDGE_MODEL=llama3.1:8b
export ROUTER_PRO_MODEL=gemma4:26b
export ROUTER_FLASH_MODEL=qwen2.5:7b

If you intentionally want an all-gemma4:26b Planner/Judge/PRO setup, use longer timeouts and serialized graph execution:

export ROUTER_PLANNER_MODEL=gemma4:26b
export ROUTER_JUDGE_MODEL=gemma4:26b
export ROUTER_PRO_MODEL=gemma4:26b
export ROUTER_FLASH_MODEL=qwen2.5:7b
export ROUTER_JUDGE_TIMEOUT=600
export ROUTER_MAX_CONCURRENCY=1

Security Boundaries

  • The router only consumes task text, model names, and documented ROUTER_* settings.
  • It has no install hook, background persistence, arbitrary local file scanning, or destructive file operations.
  • Ollama traffic is local by default. Remote ROUTER_OLLAMA_URL values are refused unless ROUTER_ALLOW_REMOTE_OLLAMA=1 is set.
  • Gemini CLI execution is restricted to an executable named gemini and receives only a minimal allowlisted environment.
  • Provider prompts and outputs may leave the machine when using Gemini CLI or an explicitly trusted remote Ollama endpoint. Use local Ollama for sensitive work.

Usage

Basic Usage (via exec)

When user says "走 super-router", "use super-router", or asks for router analysis:

# Direct execution with task as argument
terminal(command="/opt/homebrew/Caskroom/miniforge/base/bin/python ~/.openclaw/skills/super-router/scripts/router.py '分析 K8s YAML 错误并重写配置'")

With Streaming (Node-Level Progress)

terminal(command="/opt/homebrew/Caskroom/miniforge/base/bin/python ~/.openclaw/skills/super-router/scripts/router.py --stream 'Your complex task'")

Via Environment Variable (Agent Compatibility)

For agents that struggle with non-ASCII arguments:

# Normalize task to short ASCII English, then pass as argument
terminal(command="/opt/homebrew/Caskroom/miniforge/base/bin/python ~/.openclaw/skills/super-router/scripts/router.py 'Analyze K8s YAML errors and fix'")

# Or via env var (if agent supports it)
terminal(command="/opt/homebrew/Caskroom/miniforge/base/bin/python ~/.openclaw/skills/super-router/scripts/router.py", 
         env={"ROUTER_TASK": "Your complex task description"})

Handling Long-Running Execution

If exec returns "Command still running":

# Continue polling with process tool
process(action="poll", session_id="<session_id_from_exec>")

# Wait for completion
process(action="wait", session_id="<session_id_from_exec>", timeout=300)

Important: Once process shows completion, your next assistant message MUST start with Router result: or Router failed: and include at least one real detail from the output (e.g., "Planner fallback", "Ollama timed out", "BTC"). Never reply with just ---, punctuation, or empty lines.

Environment Variables

VariablePurposeDefault
ROUTER_PLANNER_MODELTask decomposition modelgemma4:26b
ROUTER_JUDGE_MODELComplexity scoring modelllama3.1:8b
ROUTER_PRO_MODELHeavy reasoning executorgoogle-gemini-cli/gemini-3-pro-preview
ROUTER_FLASH_MODELFast executorgoogle-gemini-cli/flash
ROUTER_PRO_FALLBACK_MODELSComma-separated PRO fallback listNone
ROUTER_FLASH_FALLBACK_MODELSComma-separated FLASH fallback listNone
ROUTER_FLASH_RETRY_BUDGETMax FLASH retries before escalation1
ROUTER_RECURSION_LIMITPython recursion limit128
ROUTER_JUDGE_TIMEOUTTimeout for Judge node LLM calls (seconds)300 (up to 6000 for extremely complex tasks with large models)
ROUTER_MAX_CONCURRENCYLangGraph max node concurrency; set 1 for local 26B+ Judge modelsAuto (1 for large Judge models)
ROUTER_GEMINI_CLIPath to Gemini CLI (if using instead of Ollama)/opt/homebrew/bin/gemini
ROUTER_OLLAMA_URLOllama API endpointhttp://localhost:11434/api/generate
ROUTER_ALLOW_REMOTE_OLLAMAOpt in to non-local Ollama endpoints after trusting themOff
ROUTER_FINALIZER_TIMEOUTTimeout for the final reporting synthesis (seconds). Essential to set high (e.g., 600) for complex tasks to avoid timeouts during context assembly.600
ROUTER_DEBUGPrint raw planner/judge/Ollama diagnostic snippetsOff

For large models (20B+ like gemma4:26b):

  • Prefer ROUTER_PLANNER_MODEL=gemma4:26b with ROUTER_JUDGE_MODEL=llama3.1:8b
  • If using ROUTER_JUDGE_MODEL=gemma4:26b, set ROUTER_JUDGE_TIMEOUT=600 and keep ROUTER_MAX_CONCURRENCY=1
  • Planner timeout is auto-set to 300s for large models
  • Expect 2-5 minute wait times per LLM call
  • Model warmup adds ~30-60s upfront but prevents timeouts.
  • Crucial: A 60s terminal timeout can still kill the run even if internal router timeouts are higher. Use --stream, process polling via process(action='poll'), and a longer terminal/process wait timeout for large Planner/Judge runs.

Complexity Routing Rules

5-Dimension Scoring

The Judge scores each subtask on:

  1. reasoning_depth (1-10): How much logical inference is needed?
  2. code_change_scope (1-10): How many files/lines of code to modify?
  3. ambiguity (1-10): How unclear is the task specification?
  4. risk (1-10): What's the impact of getting this wrong?
  5. io_heaviness (1-10): How much reading/writing vs. thinking?

Routing Thresholds

ConditionRoute
complexity_score >= 5PRO
complexity_score <= 2FLASH
Summary-like task (no deep work)FLASH
High-risk incident diagnosisPRO
High-risk evidence gatheringPRO
High-risk decision/rollback evaluationPRO
Boundary case + low confidencePRO (safe default)

Contextual Score Biases

The router applies automatic adjustments:

  • High-risk context (production, billing, security): boosts reasoning_depth, risk, ambiguity
  • Evidence gathering in incident: keeps on PRO (not mere IO)
  • Communication/summary subtasks: routed to FLASH unless deep work is also required

FLASH Review & Escalation Logic

When FLASH execution fails or produces questionable output:

  1. Classify failure type:

    • infra_transient: timeout, network, rate limit, service unavailable
    • capability_quality: "need more info", empty output, too short, repeated task
  2. Decision:

    • Infra failure → Retry FLASH (up to ROUTER_FLASH_RETRY_BUDGET)
    • Capability failure → Escalate to PRO immediately
    • Unknown → Retry once, then escalate
  3. Post-execution verification:

    • Empty output → escalate
    • Output < 48 chars (non-summary) → escalate
    • Output explicitly says "can't complete" → escalate
    • Output just repeats task description → escalate

Finalizer Fallback Chain

Final report generation follows:

FLASH finalizer → (if fails) → PRO finalizer → (if fails) → Deterministic template

Output Structure

  • Output Structure: The router returns a JSON-serializable state. When summarizing these results in reports or documentation, always use ASCII/Terminal-style arrows (e.g., '-->', '->') rather than mathematical arrows (e.g., '→', '$\rightarrow$') for all diagrams and flow representations. This is a high-priority stylistic requirement.
{
  "task": "original task string",
  "planner_model": "model name used for planning",
  "judge_model": "model name used for complexity scoring",
  "pro_model": "primary PRO model",
  "flash_model": "primary FLASH model",
  "planned_subtasks": [{"desc": "..."}],
  "subtasks": [
    {
      "desc": "...",
      "model": "PRO|FLASH",
      "assessment": {
        "scores": {"reasoning_depth": 5, "code_change_scope": 3, "ambiguity": 2, "risk": 4, "io_heaviness": 1},
        "complexity_score": 15,
        "suggested_route": "PRO",
        "final_route": "PRO",
        "confidence": 0.85,
        "reason": "...",
        "judge_source": "llm|heuristic"
      }
    }
  ],
  "results": [
    {
      "step": 1,
      "planned_route": "PRO",
      "route": "PRO",
      "model_name": "qwen3",
      "desc": "...",
      "output": "...",
      "status": "success|failed",
      "attempt_count": 1,
      "retry_count": 0,
      "escalated_from_flash": false,
      "used_provider_fallback": false,
      "flash_review": {"decision": "record", "failure_type": "none", "reason": "..."},
      "attempt_log": ["..."]
    }
  ],
  "final_report": "...",
  "finalizer_outcome": {
    "route": "FLASH|PRO|DETERMINISTIC",
    "model_name": "...",
    "status": "...",
    "used_provider_fallback": false,
    "reason": "...",
    "attempt_log": ["..."]
  }
}

Example Workflows

Example 1: K8s Incident Triage

router.py "生产环境 K8s Pod 频繁重启,分析日志找出根因,给出修复方案并整理给值班同事的简短行动摘要"

Expected routing:

  1. "分析 Pod 重启日志,定位错误模式" → PRO (high-risk diagnosis)
  2. "确定根因(资源不足/配置错误/依赖故障)" → PRO (high-risk decision)
  3. "制定修复方案(YAML 调整/回滚/扩容)" → PRO (high-risk repair plan)
  4. "整理给值班同事的简短行动摘要" → FLASH (communication/summary)

Example 2: Code Refactoring

router.py "Refactor auth module to use JWT, add unit tests, update docs"

Expected routing:

  1. "Analyze current auth implementation" → PRO (deep inspection)
  2. "Design JWT claims model" → PRO (design logic)
  3. "Implement JWT encoding/decoding" → PRO (implementation)
  4. "Add unit tests for JWT functions" → PRO (test logic)
  5. "Update README with JWT usage examples" → FLASH (documentation)

Example 3: Simple Summary

router.py "Summarize the last 10 git commits"

Expected routing:

  • Single subtask → FLASH (summary-like, low complexity)

Maintenance

FilePurpose
scripts/router.pyMain LangGraph router script
SKILL.mdThis documentation

Troubleshooting

"Router timed out" / "Ollama returned an empty response"

  • Best fix when keeping a large Planner: keep ROUTER_PLANNER_MODEL=gemma4:26b, but set ROUTER_JUDGE_MODEL=llama3.1:8b.
  • All-gemma mode: set ROUTER_JUDGE_MODEL=gemma4:26b, ROUTER_JUDGE_TIMEOUT=600, and ROUTER_MAX_CONCURRENCY=1; expect much longer runs.
  • Use --stream and increase the terminal/process timeout if the Planner itself may take longer than 60s.
  • Set ROUTER_JUDGE_TIMEOUT=300 or higher only when intentionally using a 20B+ Judge.
  • Alternative: use Gemini CLI for planning: ROUTER_PLANNER_MODEL=google-gemini-cli/gemini-3-pro-preview.

"Planner timed out after 30s" (or 90s)

  • Model is too large or not loaded. Warmup helps but large models may still timeout.
  • Use --stream plus a longer terminal/process timeout, or choose a smaller planner model.
  • Check Ollama logs: ollama serve output for errors

"FLASH kept escalating to PRO"

  • Task may genuinely require heavy reasoning
  • Check if FLASH model is too small for your tasks
  • Try setting ROUTER_FLASH_MODEL to a larger model

"Gemini CLI AbortError or Auth Failures"

  • If gemini-cli returns AbortError or authentication errors in non-interactive sessions, this is often an infrastructure/API timeout or session issue.
  • Use --stream to monitor real-time progress and ensure ROUTER_JUDGE_TIMEOUT and terminal timeouts are sufficiently high to prevent external process termination.

"Planner produced only one subtask"

  • Task may be simple enough to not need decomposition
  • Planner model may be too small; try ROUTER_PLANNER_MODEL=gemma4:31b (if you have the patience for 90s+ waits)

Related Skills

  • dspy — Declarative LM programming with automatic prompt optimization (Python framework alternative)
  • subagent-driven-development — Task decomposition with OpenClaw-native delegation + two-stage review
  • llama-cpp — Run LLM inference locally (alternative to Ollama backend)

See Also

Comments

Loading comments...