Super Router (LangGraph Edition)
Intelligent task decomposition and model routing using LangGraph StateGraph. Automatically routes subtasks between PRO (heavy reasoning) and FLASH (fast) models based on structured complexity assessment.
When to Use This Skill
Use super-router when you need:
- Intelligent model routing — automatically choose between heavy (PRO) and fast (FLASH) models per subtask
- Task decomposition — break complex tasks into structured subtasks with independent routing
- Cost optimization — use fast models for simple work, heavy models only when needed
- Configurable models — use deterministic defaults, with environment-variable overrides for each role
- Failure escalation — FLASH retry on infra failures, escalate to PRO on capability failures
- Audit trail — full logging of planned vs actual routes, retries, and failure classifications
Not needed for: Simple single-turn tasks, tasks where you already know which model to use, or when you want manual control over every routing decision.
Core Architecture (LangGraph StateGraph)
| Node | Function |
|---|
| Planner | Receives original task, calls local Ollama planner model to generate ordered subtask array |
| Judge | Scores each subtask on 5 dimensions: reasoning_depth, code_change_scope, ambiguity, risk, io_heaviness; combines with thresholds + confidence to decide PRO/FLASH |
| Dispatcher | Reads RouterState.current_step, routes via conditional edge to pro_executor or flash_executor |
| PRO Executor | Heavy reasoning model (default: Gemini CLI preview model; override via ROUTER_PRO_MODEL) |
| FLASH Executor | Fast model with review/retry logic (default: Gemini CLI preview model; override via ROUTER_FLASH_MODEL) |
| FLASH Review | Validates output quality; distinguishes infra failures (timeout, network) from capability failures; retries FLASH or escalates to PRO |
| Metadata Extractor | Extracts 'Technical Gold' (atomic high-precision facts) from step output to prevent finalizer timeouts and loss of detail |
| Recorder/Finalizer | Logs every step; compiles final report using a hybrid of Technical Gold and full audit trails; supports FLASH→PRO→deterministic fallback chain |
Installation
# Required: LangGraph + Ollama
pip install langgraph
# Ensure Ollama is running locally
ollama serve
# Pull recommended models if you use Ollama-backed roles
ollama pull gemma4:26b # Planner or PRO executor (high quality, slow)
ollama pull llama3.1:8b # Judge (fast scoring, recommended)
ollama pull qwen3 # PRO executor
ollama pull qwen2.5:7b # FLASH executor
Note: If you prefer gemma4:26b as the Planner, keep it there. For speed, the Judge should usually be llama3.1:8b or another 7B-14B model:
export ROUTER_PLANNER_MODEL=gemma4:26b
export ROUTER_JUDGE_MODEL=llama3.1:8b
export ROUTER_PRO_MODEL=gemma4:26b
export ROUTER_FLASH_MODEL=qwen2.5:7b
If you intentionally want an all-gemma4:26b Planner/Judge/PRO setup, use longer timeouts and serialized graph execution:
export ROUTER_PLANNER_MODEL=gemma4:26b
export ROUTER_JUDGE_MODEL=gemma4:26b
export ROUTER_PRO_MODEL=gemma4:26b
export ROUTER_FLASH_MODEL=qwen2.5:7b
export ROUTER_JUDGE_TIMEOUT=600
export ROUTER_MAX_CONCURRENCY=1
Security Boundaries
- The router only consumes task text, model names, and documented
ROUTER_* settings.
- It has no install hook, background persistence, arbitrary local file scanning, or destructive file operations.
- Ollama traffic is local by default. Remote
ROUTER_OLLAMA_URL values are refused unless ROUTER_ALLOW_REMOTE_OLLAMA=1 is set.
- Gemini CLI execution is restricted to an executable named
gemini and receives only a minimal allowlisted environment.
- Provider prompts and outputs may leave the machine when using Gemini CLI or an explicitly trusted remote Ollama endpoint. Use local Ollama for sensitive work.
Usage
Basic Usage (via exec)
When user says "走 super-router", "use super-router", or asks for router analysis:
# Direct execution with task as argument
terminal(command="/opt/homebrew/Caskroom/miniforge/base/bin/python ~/.openclaw/skills/super-router/scripts/router.py '分析 K8s YAML 错误并重写配置'")
With Streaming (Node-Level Progress)
terminal(command="/opt/homebrew/Caskroom/miniforge/base/bin/python ~/.openclaw/skills/super-router/scripts/router.py --stream 'Your complex task'")
Via Environment Variable (Agent Compatibility)
For agents that struggle with non-ASCII arguments:
# Normalize task to short ASCII English, then pass as argument
terminal(command="/opt/homebrew/Caskroom/miniforge/base/bin/python ~/.openclaw/skills/super-router/scripts/router.py 'Analyze K8s YAML errors and fix'")
# Or via env var (if agent supports it)
terminal(command="/opt/homebrew/Caskroom/miniforge/base/bin/python ~/.openclaw/skills/super-router/scripts/router.py",
env={"ROUTER_TASK": "Your complex task description"})
Handling Long-Running Execution
If exec returns "Command still running":
# Continue polling with process tool
process(action="poll", session_id="<session_id_from_exec>")
# Wait for completion
process(action="wait", session_id="<session_id_from_exec>", timeout=300)
Important: Once process shows completion, your next assistant message MUST start with Router result: or Router failed: and include at least one real detail from the output (e.g., "Planner fallback", "Ollama timed out", "BTC"). Never reply with just ---, punctuation, or empty lines.
Environment Variables
| Variable | Purpose | Default |
|---|
ROUTER_PLANNER_MODEL | Task decomposition model | gemma4:26b |
ROUTER_JUDGE_MODEL | Complexity scoring model | llama3.1:8b |
ROUTER_PRO_MODEL | Heavy reasoning executor | google-gemini-cli/gemini-3-pro-preview |
ROUTER_FLASH_MODEL | Fast executor | google-gemini-cli/flash |
ROUTER_PRO_FALLBACK_MODELS | Comma-separated PRO fallback list | None |
ROUTER_FLASH_FALLBACK_MODELS | Comma-separated FLASH fallback list | None |
ROUTER_FLASH_RETRY_BUDGET | Max FLASH retries before escalation | 1 |
ROUTER_RECURSION_LIMIT | Python recursion limit | 128 |
ROUTER_JUDGE_TIMEOUT | Timeout for Judge node LLM calls (seconds) | 300 (up to 6000 for extremely complex tasks with large models) |
ROUTER_MAX_CONCURRENCY | LangGraph max node concurrency; set 1 for local 26B+ Judge models | Auto (1 for large Judge models) |
ROUTER_GEMINI_CLI | Path to Gemini CLI (if using instead of Ollama) | /opt/homebrew/bin/gemini |
ROUTER_OLLAMA_URL | Ollama API endpoint | http://localhost:11434/api/generate |
ROUTER_ALLOW_REMOTE_OLLAMA | Opt in to non-local Ollama endpoints after trusting them | Off |
ROUTER_FINALIZER_TIMEOUT | Timeout for the final reporting synthesis (seconds). Essential to set high (e.g., 600) for complex tasks to avoid timeouts during context assembly. | 600 |
ROUTER_DEBUG | Print raw planner/judge/Ollama diagnostic snippets | Off |
For large models (20B+ like gemma4:26b):
- Prefer
ROUTER_PLANNER_MODEL=gemma4:26b with ROUTER_JUDGE_MODEL=llama3.1:8b
- If using
ROUTER_JUDGE_MODEL=gemma4:26b, set ROUTER_JUDGE_TIMEOUT=600 and keep ROUTER_MAX_CONCURRENCY=1
- Planner timeout is auto-set to 300s for large models
- Expect 2-5 minute wait times per LLM call
- Model warmup adds ~30-60s upfront but prevents timeouts.
- Crucial: A 60s terminal timeout can still kill the run even if internal router timeouts are higher. Use
--stream, process polling via process(action='poll'), and a longer terminal/process wait timeout for large Planner/Judge runs.
Complexity Routing Rules
5-Dimension Scoring
The Judge scores each subtask on:
- reasoning_depth (1-10): How much logical inference is needed?
- code_change_scope (1-10): How many files/lines of code to modify?
- ambiguity (1-10): How unclear is the task specification?
- risk (1-10): What's the impact of getting this wrong?
- io_heaviness (1-10): How much reading/writing vs. thinking?
Routing Thresholds
| Condition | Route |
|---|
complexity_score >= 5 | PRO |
complexity_score <= 2 | FLASH |
| Summary-like task (no deep work) | FLASH |
| High-risk incident diagnosis | PRO |
| High-risk evidence gathering | PRO |
| High-risk decision/rollback evaluation | PRO |
| Boundary case + low confidence | PRO (safe default) |
Contextual Score Biases
The router applies automatic adjustments:
- High-risk context (production, billing, security): boosts
reasoning_depth, risk, ambiguity
- Evidence gathering in incident: keeps on PRO (not mere IO)
- Communication/summary subtasks: routed to FLASH unless deep work is also required
FLASH Review & Escalation Logic
When FLASH execution fails or produces questionable output:
-
Classify failure type:
infra_transient: timeout, network, rate limit, service unavailable
capability_quality: "need more info", empty output, too short, repeated task
-
Decision:
- Infra failure → Retry FLASH (up to
ROUTER_FLASH_RETRY_BUDGET)
- Capability failure → Escalate to PRO immediately
- Unknown → Retry once, then escalate
-
Post-execution verification:
- Empty output → escalate
- Output < 48 chars (non-summary) → escalate
- Output explicitly says "can't complete" → escalate
- Output just repeats task description → escalate
Finalizer Fallback Chain
Final report generation follows:
FLASH finalizer → (if fails) → PRO finalizer → (if fails) → Deterministic template
Output Structure
- Output Structure: The router returns a JSON-serializable state. When summarizing these results in reports or documentation, always use ASCII/Terminal-style arrows (e.g., '-->', '->') rather than mathematical arrows (e.g., '→', '$\rightarrow$') for all diagrams and flow representations. This is a high-priority stylistic requirement.
{
"task": "original task string",
"planner_model": "model name used for planning",
"judge_model": "model name used for complexity scoring",
"pro_model": "primary PRO model",
"flash_model": "primary FLASH model",
"planned_subtasks": [{"desc": "..."}],
"subtasks": [
{
"desc": "...",
"model": "PRO|FLASH",
"assessment": {
"scores": {"reasoning_depth": 5, "code_change_scope": 3, "ambiguity": 2, "risk": 4, "io_heaviness": 1},
"complexity_score": 15,
"suggested_route": "PRO",
"final_route": "PRO",
"confidence": 0.85,
"reason": "...",
"judge_source": "llm|heuristic"
}
}
],
"results": [
{
"step": 1,
"planned_route": "PRO",
"route": "PRO",
"model_name": "qwen3",
"desc": "...",
"output": "...",
"status": "success|failed",
"attempt_count": 1,
"retry_count": 0,
"escalated_from_flash": false,
"used_provider_fallback": false,
"flash_review": {"decision": "record", "failure_type": "none", "reason": "..."},
"attempt_log": ["..."]
}
],
"final_report": "...",
"finalizer_outcome": {
"route": "FLASH|PRO|DETERMINISTIC",
"model_name": "...",
"status": "...",
"used_provider_fallback": false,
"reason": "...",
"attempt_log": ["..."]
}
}
Example Workflows
Example 1: K8s Incident Triage
router.py "生产环境 K8s Pod 频繁重启,分析日志找出根因,给出修复方案并整理给值班同事的简短行动摘要"
Expected routing:
- "分析 Pod 重启日志,定位错误模式" → PRO (high-risk diagnosis)
- "确定根因(资源不足/配置错误/依赖故障)" → PRO (high-risk decision)
- "制定修复方案(YAML 调整/回滚/扩容)" → PRO (high-risk repair plan)
- "整理给值班同事的简短行动摘要" → FLASH (communication/summary)
Example 2: Code Refactoring
router.py "Refactor auth module to use JWT, add unit tests, update docs"
Expected routing:
- "Analyze current auth implementation" → PRO (deep inspection)
- "Design JWT claims model" → PRO (design logic)
- "Implement JWT encoding/decoding" → PRO (implementation)
- "Add unit tests for JWT functions" → PRO (test logic)
- "Update README with JWT usage examples" → FLASH (documentation)
Example 3: Simple Summary
router.py "Summarize the last 10 git commits"
Expected routing:
- Single subtask → FLASH (summary-like, low complexity)
Maintenance
| File | Purpose |
|---|
scripts/router.py | Main LangGraph router script |
SKILL.md | This documentation |
Troubleshooting
"Router timed out" / "Ollama returned an empty response"
- Best fix when keeping a large Planner: keep
ROUTER_PLANNER_MODEL=gemma4:26b, but set ROUTER_JUDGE_MODEL=llama3.1:8b.
- All-gemma mode: set
ROUTER_JUDGE_MODEL=gemma4:26b, ROUTER_JUDGE_TIMEOUT=600, and ROUTER_MAX_CONCURRENCY=1; expect much longer runs.
- Use
--stream and increase the terminal/process timeout if the Planner itself may take longer than 60s.
- Set
ROUTER_JUDGE_TIMEOUT=300 or higher only when intentionally using a 20B+ Judge.
- Alternative: use Gemini CLI for planning:
ROUTER_PLANNER_MODEL=google-gemini-cli/gemini-3-pro-preview.
"Planner timed out after 30s" (or 90s)
- Model is too large or not loaded. Warmup helps but large models may still timeout.
- Use
--stream plus a longer terminal/process timeout, or choose a smaller planner model.
- Check Ollama logs:
ollama serve output for errors
"FLASH kept escalating to PRO"
- Task may genuinely require heavy reasoning
- Check if FLASH model is too small for your tasks
- Try setting
ROUTER_FLASH_MODEL to a larger model
"Gemini CLI AbortError or Auth Failures"
- If gemini-cli returns AbortError or authentication errors in non-interactive sessions, this is often an infrastructure/API timeout or session issue.
- Use
--stream to monitor real-time progress and ensure ROUTER_JUDGE_TIMEOUT and terminal timeouts are sufficiently high to prevent external process termination.
"Planner produced only one subtask"
- Task may be simple enough to not need decomposition
- Planner model may be too small; try
ROUTER_PLANNER_MODEL=gemma4:31b (if you have the patience for 90s+ waits)
Related Skills
- dspy — Declarative LM programming with automatic prompt optimization (Python framework alternative)
- subagent-driven-development — Task decomposition with OpenClaw-native delegation + two-stage review
- llama-cpp — Run LLM inference locally (alternative to Ollama backend)
See Also