Install
openclaw skills install ollama-smart-routerIntelligent task routing between local and cloud Ollama LLM instances. Use when the user wants cost-efficient AI responses by routing simple tasks to a local Ollama model and complex tasks to a more powerful remote/cloud Ollama instance. Automatically classifies task complexity, detects system capabilities, and delegates to the appropriate model tier. Use for any request where you want to balance latency vs capability, or when explicitly asked to use smart routing, local-first routing, or Ollama model selection.
openclaw skills install ollama-smart-routerRoutes tasks between a local Ollama instance (fast, cheap) and a remote/cloud Ollama instance (more capable) based on task complexity classification and system capabilities.
# 1. Profile your system
python scripts/system_profiler.py
# 2. Check endpoints are healthy
python scripts/health_check.py
# 3. Route a task
python scripts/route.py "What is quantum computing?"
User Request
↓
System Profiler (detects compatible models)
↓
Health Check (verifies endpoints are up)
↓
Classify Task (1-5 complexity score)
↓
├─ Score 1-2 → Local Ollama (fast, cheap)
├─ Score 3-5 → Cloud Ollama (powerful)
└─ Match specialist → Dedicated model
↓
Verify model available (fallback if not)
↓
Stream Response
| Score | Complexity | Examples | Routed To |
|---|---|---|---|
| 1 | Simple | "What is 2+2?", "Define entropy" | Local |
| 2 | Basic | "Write hello world in Python" | Local |
| 3 | Complex | "Debug this error", "Compare X vs Y" | Cloud |
| 4 | Deep | "Design a system", "Research topic" | Cloud |
| 5 | Expert | "Build from scratch", "Multi-file project" | Cloud |
smart-router/
├── SKILL.md # This file
├── __init__.py # Python package interface
├── requirements.txt # Dependencies
│
├── config/
│ ├── router.yaml # Main configuration
│ └── system_profile.json # Auto-generated system specs
│
├── scripts/
│ ├── classify.py # Task complexity classifier
│ ├── execute.py # Ollama API client
│ ├── route.py # Main routing logic
│ ├── system_profiler.py # Hardware detection
│ └── health_check.py # Endpoint health verification
│
├── tests/
│ └── test_classifier.py # Test suite
│
└── references/
└── classifier-prompt.txt # LLM fallback prompt
Edit config/router.yaml:
# Local Ollama (your machine)
local:
model: "llama3.2"
base_url: "http://localhost:11434"
# Cloud Ollama (remote server)
cloud:
model: "qwen2.5:14b"
base_url: "http://192.168.1.100:11434"
# Tasks scoring >= this go to cloud
threshold: 3
# Domain specialists (checked first)
specialists:
code:
model: "codellama:34b"
base_url: "http://192.168.1.100:11434"
triggers: ["code review", "refactor"]
# Performance settings
performance:
timeout_seconds: 60
stream_responses: true
retry_attempts: 2
# Caching
cache:
enabled: true
db_path: "cache/router.db"
ttl_seconds: 86400
# Basic routing
python scripts/route.py "What is the capital of France?"
# With profiling (updates system profile)
python scripts/route.py "Debug this error" --profile
# Custom config
python scripts/route.py "Design a system" --config config/my-router.yaml
# No streaming (wait for full response)
python scripts/route.py "Summarize this" --no-stream
# Health check all endpoints
python scripts/health_check.py
# Manual classification
python scripts/classify.py "Write a function"
# Output: "2:basic-task"
from smart_router import SmartRouter
# Initialize
router = SmartRouter()
# Route with streaming
for chunk in router.route("Explain quantum computing"):
print(chunk, end='')
# Classify only
score, reason = router.classify("Debug this code")
print(f"Complexity: {score}/5, Reason: {reason}")
# Get configuration
config = router.get_config()
print(f"Local model: {config['local']['model']}")
Run once (or when hardware changes):
python scripts/system_profiler.py
This creates config/system_profile.json with:
Verify endpoints before use:
python scripts/health_check.py
Checks:
When you submit a task:
Uses regex patterns (not LLM calls) for speed:
Automatically detects what your system can run:
Pre-flight checks prevent routing to dead endpoints:
✓ local | Status: healthy | Latency: 45ms | Models: 5
✗ cloud | Status: unreachable | Error: Connection refused
Even though Ollama is free, logs track latency:
[2024-01-15T10:30:00] task: '...' -> local | model: llama3.2 | latency: 0.85s
[2024-01-15T10:30:45] task: '...' -> cloud | model: qwen2.5:14b | latency: 3.2s
# Run classifier tests
python tests/test_classifier.py
# Expected output:
# ✓ PASS [1] Simple factual question
# ✓ PASS [1] Zip code (not code)
# ✓ PASS [3] Debugging
# ...
# Results: X passed, Y failed
# Check if Ollama is running
ollama serve
# Verify endpoint
curl http://localhost:11434/api/tags
# Pull the model
ollama pull llama3.2
# Or let router auto-fallback to available model
Check pattern in scripts/classify.py:
# Add new pattern
COMPLEXITY_PATTERNS[2].append(r'your\s+pattern\s+here')
# In config/router.yaml
performance:
timeout_seconds: 30 # Reduce timeout
pip install -r requirements.txt| Approach | Latency | Cost | Accuracy | Verdict |
|---|---|---|---|---|
| Pattern matching | 30ms | 0 tokens | 90% | ✅ Used |
| LLM classification | 500ms | 50 tokens | 95% | Optional (--llm) |
Pattern matching wins on speed/cost. Accuracy is good enough for routing.
Ollama-only keeps everything: