Model Routing Middleware

Data & APIs

Intelligent model selection middleware for AI agents. Route tasks to the best model, manage context, and cut API costs 40-70%.

Install

openclaw skills install model-routing-middleware

Model Routing — Intelligent Model Selection Middleware

Automatically select the best LLM model and think mode based on task type, context size, and response confidence. Cut API costs 40-70% by routing simple tasks to fast models and complex tasks to capable ones.

How It Works

User message → Task Classifier → Model Router → Best Model → Response
                                     ↓
                          Low confidence? → Escalate to stronger model

Quick Start

# config.yaml
models:
  casual_chat:
    model: qwen3-14b
    think: false
  coding:
    model: qwen-coder
    think: true
  reasoning:
    model: deepseek-r1
    think: true
  long_context:
    model: glm-5.1
    think: false

from router import get_router

router = get_router()
result = await router.route("Write a Python web scraper")
# → Routes to qwen-coder with think=True

Features

Task-type classification (coding, reasoning, chat, summarization)
Per-model think mode configuration
Confidence-based escalation (retry with stronger model)
Context management and compaction at 55% threshold
Hot-reload configuration (no restart needed)
83 tests passing

Cost Savings

Task Type	Without Routing	With Routing	Savings
Casual chat	GPT-4 ($0.03/1K)	Qwen3-14B (local)	~100%
Coding	GPT-4 ($0.03/1K)	Qwen-Coder (local)	~95%
Hard reasoning	GPT-4 ($0.03/1K)	DeepSeek-R1 (local)	~90%

License

MIT