A LLM router skill for OpenClaw

v0.1.1

LangGraph-based intelligent task router that splits work between PRO (heavy reasoning) and FLASH (fast) models using 5-dimension complexity scoring, configur...

⭐ 1· 38·0 current·0 all-time

by@fanyadan

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for fanyadan/super-router.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "A LLM router skill for OpenClaw" (fanyadan/super-router) from ClawHub.
Skill page: https://clawhub.ai/fanyadan/super-router
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required binaries: python
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install super-router

ClawHub CLI

Package manager switcher

npx clawhub@latest install super-router

Security Scan

Capability signals

CryptoCan make purchasesRequires sensitive credentials

These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.

VirusTotal

Pending

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

Name/description (model routing + task decomposition) align with the included code and instructions: the repository contains a LangGraph-based router, tests, and README that document using local Ollama or Gemini CLI. Required runtime pieces (python, langgraph, local Ollama or gemini CLI) are appropriate for the declared functionality. There are no unexpected cloud or platform credentials demanded by the skill.

ℹ

Instruction Scope

SKILL.md and README instruct the agent/user to run the included Python script and to install langgraph and optionally pull models from Ollama or use Gemini CLI. The runtime instructions reference environment variables (ROUTER_*) and local paths (~/.openclaw/skills/...), but they do not instruct arbitrary file-system scanning, reading unrelated secrets, or exfiltration. Note: examples include hard-coded exec paths and recommend pulling large models and network access for Gemini CLI — these are operational concerns rather than misaligned behavior.

ℹ

Install Mechanism

There is no formal install spec in the registry (instruction-only), but the package contains runnable code (scripts/router.py). The documented install steps are manual (pip install langgraph; run/serve Ollama; pull models). This is reasonable for a local Python tool, but because the code will run locally when invoked, users should inspect the script before executing it. No high-risk remote download/install URLs or archive extraction steps are present in the skill metadata.

✓

Credentials

The skill does not require any credentials in the registry metadata. The implementation optionally respects ROUTER_* env vars (model names, endpoints, debug flags) and allows GEMINI_API_KEY / Google application creds in a tightly controlled allowlist for Gemini CLI calls. It does not request unrelated secrets (AWS keys, SSH keys, etc.). The code intentionally refuses non-local Ollama URLs unless explicitly enabled (ROUTER_ALLOW_REMOTE_OLLAMA=1), which is a proportional safety control.

✓

Persistence & Privilege

The skill does not declare always:true and has no install hooks or background persistence in the registry metadata. It runs on demand via the included script. The default ability for the agent to invoke skills autonomously is unchanged (disable-model-invocation is false), which is normal — no extra persistent privileges are requested by the skill itself.

Assessment

This skill appears coherent with its stated purpose, but it executes local Python code and can invoke local model servers or the Gemini CLI. Before installing or running: 1) inspect scripts/router.py yourself (or review with a trusted reviewer) because the repository contains executable code; 2) prefer local Ollama (default) for sensitive data — do not set ROUTER_ALLOW_REMOTE_OLLAMA unless you trust the remote endpoint; 3) only provide GEMINI_API_KEY or Google credentials if you intend to use Gemini CLI and understand that provider outputs may leave your machine; 4) ensure the gemini executable you point to is legitimate (the code enforces it be named `gemini`); and 5) run the tool in an isolated environment (container or VM) if you are unsure about model binaries or network interactions.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

Binspython

latestvk9703xqx87js53se736qbwzg1585prm3openclawvk9703xqx87js53se736qbwzg1585prm3

38downloads

1stars

2versions

Updated 10h ago

v0.1.1

MIT-0

Super Router (LangGraph Edition)

Intelligent task decomposition and model routing using LangGraph StateGraph. Automatically routes subtasks between PRO (heavy reasoning) and FLASH (fast) models based on structured complexity assessment.

When to Use This Skill

Use super-router when you need:

Intelligent model routing — automatically choose between heavy (PRO) and fast (FLASH) models per subtask
Task decomposition — break complex tasks into structured subtasks with independent routing
Cost optimization — use fast models for simple work, heavy models only when needed
Configurable models — use deterministic defaults, with environment-variable overrides for each role
Failure escalation — FLASH retry on infra failures, escalate to PRO on capability failures
Audit trail — full logging of planned vs actual routes, retries, and failure classifications

Not needed for: Simple single-turn tasks, tasks where you already know which model to use, or when you want manual control over every routing decision.

Core Architecture (LangGraph StateGraph)

Node	Function
Planner	Receives original task, calls local Ollama planner model to generate ordered subtask array
Judge	Scores each subtask on 5 dimensions: `reasoning_depth`, `code_change_scope`, `ambiguity`, `risk`, `io_heaviness`; combines with thresholds + confidence to decide PRO/FLASH
Dispatcher	Reads `RouterState.current_step`, routes via conditional edge to pro_executor or flash_executor
PRO Executor	Heavy reasoning model (default: Gemini CLI preview model; override via `ROUTER_PRO_MODEL`)
FLASH Executor	Fast model with review/retry logic (default: Gemini CLI preview model; override via `ROUTER_FLASH_MODEL`)
FLASH Review	Validates output quality; distinguishes infra failures (timeout, network) from capability failures; retries FLASH or escalates to PRO
Metadata Extractor	Extracts 'Technical Gold' (atomic high-precision facts) from step output to prevent finalizer timeouts and loss of detail
Recorder/Finalizer	Logs every step; compiles final report using a hybrid of Technical Gold and full audit trails; supports FLASH→PRO→deterministic fallback chain

Installation

# Required: LangGraph + Ollama
pip install langgraph

# Ensure Ollama is running locally
ollama serve

# Pull recommended models if you use Ollama-backed roles
ollama pull gemma4:26b     # Planner or PRO executor (high quality, slow)
ollama pull llama3.1:8b    # Judge (fast scoring, recommended)
ollama pull qwen3         # PRO executor
ollama pull qwen2.5:7b    # FLASH executor

Note: If you prefer gemma4:26b as the Planner, keep it there. For speed, the Judge should usually be llama3.1:8b or another 7B-14B model:

export ROUTER_PLANNER_MODEL=gemma4:26b
export ROUTER_JUDGE_MODEL=llama3.1:8b
export ROUTER_PRO_MODEL=gemma4:26b
export ROUTER_FLASH_MODEL=qwen2.5:7b

If you intentionally want an all-gemma4:26b Planner/Judge/PRO setup, use longer timeouts and serialized graph execution:

export ROUTER_PLANNER_MODEL=gemma4:26b
export ROUTER_JUDGE_MODEL=gemma4:26b
export ROUTER_PRO_MODEL=gemma4:26b
export ROUTER_FLASH_MODEL=qwen2.5:7b
export ROUTER_JUDGE_TIMEOUT=600
export ROUTER_MAX_CONCURRENCY=1

Security Boundaries

The router only consumes task text, model names, and documented ROUTER_* settings.
It has no install hook, background persistence, arbitrary local file scanning, or destructive file operations.
Ollama traffic is local by default. Remote ROUTER_OLLAMA_URL values are refused unless ROUTER_ALLOW_REMOTE_OLLAMA=1 is set.
Gemini CLI execution is restricted to an executable named gemini and receives only a minimal allowlisted environment.
Provider prompts and outputs may leave the machine when using Gemini CLI or an explicitly trusted remote Ollama endpoint. Use local Ollama for sensitive work.

Usage

Basic Usage (via exec)

When user says "走 super-router", "use super-router", or asks for router analysis:

# Direct execution with task as argument
terminal(command="/opt/homebrew/Caskroom/miniforge/base/bin/python ~/.openclaw/skills/super-router/scripts/router.py '分析 K8s YAML 错误并重写配置'")

With Streaming (Node-Level Progress)

terminal(command="/opt/homebrew/Caskroom/miniforge/base/bin/python ~/.openclaw/skills/super-router/scripts/router.py --stream 'Your complex task'")

Via Environment Variable (Agent Compatibility)

For agents that struggle with non-ASCII arguments:

# Normalize task to short ASCII English, then pass as argument
terminal(command="/opt/homebrew/Caskroom/miniforge/base/bin/python ~/.openclaw/skills/super-router/scripts/router.py 'Analyze K8s YAML errors and fix'")

# Or via env var (if agent supports it)
terminal(command="/opt/homebrew/Caskroom/miniforge/base/bin/python ~/.openclaw/skills/super-router/scripts/router.py", 
         env={"ROUTER_TASK": "Your complex task description"})

Handling Long-Running Execution

If exec returns "Command still running":

# Continue polling with process tool
process(action="poll", session_id="<session_id_from_exec>")

# Wait for completion
process(action="wait", session_id="<session_id_from_exec>", timeout=300)

Important: Once process shows completion, your next assistant message MUST start with Router result: or Router failed: and include at least one real detail from the output (e.g., "Planner fallback", "Ollama timed out", "BTC"). Never reply with just ---, punctuation, or empty lines.

Environment Variables

Variable	Purpose	Default
`ROUTER_PLANNER_MODEL`	Task decomposition model	`gemma4:26b`
`ROUTER_JUDGE_MODEL`	Complexity scoring model	`llama3.1:8b`
`ROUTER_PRO_MODEL`	Heavy reasoning executor	`google-gemini-cli/gemini-3-pro-preview`
`ROUTER_FLASH_MODEL`	Fast executor	`google-gemini-cli/flash`
`ROUTER_PRO_FALLBACK_MODELS`	Comma-separated PRO fallback list	None
`ROUTER_FLASH_FALLBACK_MODELS`	Comma-separated FLASH fallback list	None
`ROUTER_FLASH_RETRY_BUDGET`	Max FLASH retries before escalation	1
`ROUTER_RECURSION_LIMIT`	Python recursion limit	128
`ROUTER_JUDGE_TIMEOUT`	Timeout for Judge node LLM calls (seconds)	300 (up to 6000 for extremely complex tasks with large models)
`ROUTER_MAX_CONCURRENCY`	LangGraph max node concurrency; set `1` for local 26B+ Judge models	Auto (`1` for large Judge models)
`ROUTER_GEMINI_CLI`	Path to Gemini CLI (if using instead of Ollama)	`/opt/homebrew/bin/gemini`
`ROUTER_OLLAMA_URL`	Ollama API endpoint	`http://localhost:11434/api/generate`
`ROUTER_ALLOW_REMOTE_OLLAMA`	Opt in to non-local Ollama endpoints after trusting them	Off
`ROUTER_FINALIZER_TIMEOUT`	Timeout for the final reporting synthesis (seconds). Essential to set high (e.g., 600) for complex tasks to avoid timeouts during context assembly.	600
`ROUTER_DEBUG`	Print raw planner/judge/Ollama diagnostic snippets	Off

For large models (20B+ like gemma4:26b):

Prefer ROUTER_PLANNER_MODEL=gemma4:26b with ROUTER_JUDGE_MODEL=llama3.1:8b
If using ROUTER_JUDGE_MODEL=gemma4:26b, set ROUTER_JUDGE_TIMEOUT=600 and keep ROUTER_MAX_CONCURRENCY=1
Planner timeout is auto-set to 300s for large models
Expect 2-5 minute wait times per LLM call
Model warmup adds ~30-60s upfront but prevents timeouts.
Crucial: A 60s terminal timeout can still kill the run even if internal router timeouts are higher. Use --stream, process polling via process(action='poll'), and a longer terminal/process wait timeout for large Planner/Judge runs.

Complexity Routing Rules

5-Dimension Scoring

The Judge scores each subtask on:

reasoning_depth (1-10): How much logical inference is needed?
code_change_scope (1-10): How many files/lines of code to modify?
ambiguity (1-10): How unclear is the task specification?
risk (1-10): What's the impact of getting this wrong?
io_heaviness (1-10): How much reading/writing vs. thinking?

Routing Thresholds

Condition	Route
`complexity_score >= 5`	PRO
`complexity_score <= 2`	FLASH
Summary-like task (no deep work)	FLASH
High-risk incident diagnosis	PRO
High-risk evidence gathering	PRO
High-risk decision/rollback evaluation	PRO
Boundary case + low confidence	PRO (safe default)

Contextual Score Biases

The router applies automatic adjustments:

High-risk context (production, billing, security): boosts reasoning_depth, risk, ambiguity
Evidence gathering in incident: keeps on PRO (not mere IO)
Communication/summary subtasks: routed to FLASH unless deep work is also required

FLASH Review & Escalation Logic

When FLASH execution fails or produces questionable output:

Classify failure type:
- infra_transient: timeout, network, rate limit, service unavailable
- capability_quality: "need more info", empty output, too short, repeated task
Decision:
- Infra failure → Retry FLASH (up to ROUTER_FLASH_RETRY_BUDGET)
- Capability failure → Escalate to PRO immediately
- Unknown → Retry once, then escalate
Post-execution verification:
- Empty output → escalate
- Output < 48 chars (non-summary) → escalate
- Output explicitly says "can't complete" → escalate
- Output just repeats task description → escalate

Finalizer Fallback Chain

Final report generation follows:

FLASH finalizer → (if fails) → PRO finalizer → (if fails) → Deterministic template

Output Structure

Output Structure: The router returns a JSON-serializable state. When summarizing these results in reports or documentation, always use ASCII/Terminal-style arrows (e.g., '-->', '->') rather than mathematical arrows (e.g., '→', '$\rightarrow$') for all diagrams and flow representations. This is a high-priority stylistic requirement.

{
  "task": "original task string",
  "planner_model": "model name used for planning",
  "judge_model": "model name used for complexity scoring",
  "pro_model": "primary PRO model",
  "flash_model": "primary FLASH model",
  "planned_subtasks": [{"desc": "..."}],
  "subtasks": [
    {
      "desc": "...",
      "model": "PRO|FLASH",
      "assessment": {
        "scores": {"reasoning_depth": 5, "code_change_scope": 3, "ambiguity": 2, "risk": 4, "io_heaviness": 1},
        "complexity_score": 15,
        "suggested_route": "PRO",
        "final_route": "PRO",
        "confidence": 0.85,
        "reason": "...",
        "judge_source": "llm|heuristic"
      }
    }
  ],
  "results": [
    {
      "step": 1,
      "planned_route": "PRO",
      "route": "PRO",
      "model_name": "qwen3",
      "desc": "...",
      "output": "...",
      "status": "success|failed",
      "attempt_count": 1,
      "retry_count": 0,
      "escalated_from_flash": false,
      "used_provider_fallback": false,
      "flash_review": {"decision": "record", "failure_type": "none", "reason": "..."},
      "attempt_log": ["..."]
    }
  ],
  "final_report": "...",
  "finalizer_outcome": {
    "route": "FLASH|PRO|DETERMINISTIC",
    "model_name": "...",
    "status": "...",
    "used_provider_fallback": false,
    "reason": "...",
    "attempt_log": ["..."]
  }
}

Example Workflows

Example 1: K8s Incident Triage

router.py "生产环境 K8s Pod 频繁重启，分析日志找出根因，给出修复方案并整理给值班同事的简短行动摘要"

Expected routing:

"分析 Pod 重启日志，定位错误模式" → PRO (high-risk diagnosis)
"确定根因（资源不足/配置错误/依赖故障）" → PRO (high-risk decision)
"制定修复方案（YAML 调整/回滚/扩容）" → PRO (high-risk repair plan)
"整理给值班同事的简短行动摘要" → FLASH (communication/summary)

Example 2: Code Refactoring

router.py "Refactor auth module to use JWT, add unit tests, update docs"

Expected routing:

"Analyze current auth implementation" → PRO (deep inspection)
"Design JWT claims model" → PRO (design logic)
"Implement JWT encoding/decoding" → PRO (implementation)
"Add unit tests for JWT functions" → PRO (test logic)
"Update README with JWT usage examples" → FLASH (documentation)

Example 3: Simple Summary

router.py "Summarize the last 10 git commits"

Expected routing:

Single subtask → FLASH (summary-like, low complexity)

Maintenance

File	Purpose
`scripts/router.py`	Main LangGraph router script
`SKILL.md`	This documentation

Troubleshooting

"Router timed out" / "Ollama returned an empty response"

Best fix when keeping a large Planner: keep ROUTER_PLANNER_MODEL=gemma4:26b, but set ROUTER_JUDGE_MODEL=llama3.1:8b.
All-gemma mode: set ROUTER_JUDGE_MODEL=gemma4:26b, ROUTER_JUDGE_TIMEOUT=600, and ROUTER_MAX_CONCURRENCY=1; expect much longer runs.
Use --stream and increase the terminal/process timeout if the Planner itself may take longer than 60s.
Set ROUTER_JUDGE_TIMEOUT=300 or higher only when intentionally using a 20B+ Judge.
Alternative: use Gemini CLI for planning: ROUTER_PLANNER_MODEL=google-gemini-cli/gemini-3-pro-preview.

"Planner timed out after 30s" (or 90s)

Model is too large or not loaded. Warmup helps but large models may still timeout.
Use --stream plus a longer terminal/process timeout, or choose a smaller planner model.
Check Ollama logs: ollama serve output for errors

"FLASH kept escalating to PRO"

Task may genuinely require heavy reasoning
Check if FLASH model is too small for your tasks
Try setting ROUTER_FLASH_MODEL to a larger model

"Gemini CLI AbortError or Auth Failures"

If gemini-cli returns AbortError or authentication errors in non-interactive sessions, this is often an infrastructure/API timeout or session issue.
Use --stream to monitor real-time progress and ensure ROUTER_JUDGE_TIMEOUT and terminal timeouts are sufficiently high to prevent external process termination.

"Planner produced only one subtask"

Task may be simple enough to not need decomposition
Planner model may be too small; try ROUTER_PLANNER_MODEL=gemma4:31b (if you have the patience for 90s+ waits)

Related Skills

dspy — Declarative LM programming with automatic prompt optimization (Python framework alternative)
subagent-driven-development — Task decomposition with OpenClaw-native delegation + two-stage review
llama-cpp — Run LLM inference locally (alternative to Ollama backend)

Comments

Loading comments...

A LLM router skill for OpenClaw

Install

Install with OpenClaw

CLI Commands

Runtime requirements

Super Router (LangGraph Edition)

When to Use This Skill

Core Architecture (LangGraph StateGraph)

Installation

Security Boundaries

Usage

Basic Usage (via exec)

With Streaming (Node-Level Progress)

Via Environment Variable (Agent Compatibility)

Handling Long-Running Execution

Environment Variables

Complexity Routing Rules

5-Dimension Scoring

Routing Thresholds

Contextual Score Biases

FLASH Review & Escalation Logic

Finalizer Fallback Chain

Output Structure

Example Workflows

Example 1: K8s Incident Triage

Example 2: Code Refactoring

Example 3: Simple Summary

Maintenance

Troubleshooting

"Router timed out" / "Ollama returned an empty response"

"Planner timed out after 30s" (or 90s)

"FLASH kept escalating to PRO"

"Gemini CLI AbortError or Auth Failures"

"Planner produced only one subtask"

Related Skills

See Also

Comments