Model Routing Middleware

Data & APIs

Intelligent model selection middleware for AI agents. Route tasks to the best model, manage context, and cut API costs 40-70%.

Install

openclaw skills install model-routing-middleware

Model Routing — Intelligent Model Selection Middleware

Automatically select the best LLM model and think mode based on task type, context size, and response confidence. Cut API costs 40-70% by routing simple tasks to fast models and complex tasks to capable ones.

How It Works

User message → Task Classifier → Model Router → Best Model → Response
                                     ↓
                          Low confidence? → Escalate to stronger model

Quick Start

# config.yaml
models:
  casual_chat:
    model: qwen3-14b
    think: false
  coding:
    model: qwen-coder
    think: true
  reasoning:
    model: deepseek-r1
    think: true
  long_context:
    model: glm-5.1
    think: false
from router import get_router

router = get_router()
result = await router.route("Write a Python web scraper")
# → Routes to qwen-coder with think=True

Features

  • Task-type classification (coding, reasoning, chat, summarization)
  • Per-model think mode configuration
  • Confidence-based escalation (retry with stronger model)
  • Context management and compaction at 55% threshold
  • Hot-reload configuration (no restart needed)
  • 83 tests passing

Cost Savings

Task TypeWithout RoutingWith RoutingSavings
Casual chatGPT-4 ($0.03/1K)Qwen3-14B (local)~100%
CodingGPT-4 ($0.03/1K)Qwen-Coder (local)~95%
Hard reasoningGPT-4 ($0.03/1K)DeepSeek-R1 (local)~90%

License

MIT