Sage Router

Local-first AI model routing for serious agents. One endpoint. Any provider. The router figures out the rest.

Install

openclaw skills install sage-router

Sage Router

HTTP server on :8788 that routes chat requests to the optimal provider based on intent classification.

Endpoints

POST /v1/chat/completions — OpenAI-compatible; routes automatically
POST /v1/messages — Anthropic Messages API compatible; translates to/from OpenAI format internally
GET /health — Provider status, model lists, routing debug

Any Anthropic-compatible tool (Cursor, Aider, Claude Code, Zed, Continue, OpenHands) can point at http://localhost:8788 as the API base URL. Both streaming and non-streaming are supported.

Active Providers

Providers are discovered from ~/.openclaw/openclaw.json at startup.

Rules:

skips the router's own sage-router provider entry to avoid recursion
resolves ${ENV_VAR} values for baseUrl and apiKey
includes OpenClaw gateway openai-codex as a virtual provider when the auth profile exists
recognizes Google Gemini providers from generativelanguage.googleapis.com
auto-discovers Google models when the provider exists but models is empty in openclaw.json
normalizes anthropic or Anthropic-hosted anthropic-messages providers onto the local Dario proxy at localhost:3456
starts the Dario user service when Anthropic compatibility is needed and the service is not already running; in Docker, the image bundles @askalf/dario and autostarts dario proxy when credentials are mounted at /root/.dario
supports temporary provider suppression via SAGE_ROUTER_DISABLED_PROVIDERS=name1,name2

GET /health shows:

configured: all discovered providers
providers: reachable providers with model lists
disabled: providers suppressed by env

Routing Logic

The router does not perform mid-stream switching. Once a request is sent to a provider, the full response is returned or the attempt fails. If it fails, the next candidate in the chain is tried sequentially. There is no partial-output fallback or streaming handoff between providers.

Flow:

detect intent from the latest user message
estimate complexity from prompt length
score every reachable (provider, model) pair globally — not per-provider — from openclaw.json
in local-first, operate as local-strict: reject centralized Internet API providers and only allow local/LAN/Tailnet endpoints plus approved decentralized providers such as Darkbloom, with Ollama :cloud models excluded
for GENERAL, blend static heuristics with persisted empirical latency stats by provider and model
rank candidates by API type, model-name hints, complexity, and measured latency
attempt the top SAGE_ROUTER_MAX_PROVIDER_ATTEMPTS candidates in order
sage-router provider (the router itself, model auto) is scored as a low-priority recursive fallback, never preferred

Intent scoring is generic, for example:

code and analysis strongly favor Anthropic/OpenAI-style reasoning models
general/realtime requests prefer fast direct providers first
general traffic learns from real successful request latency over time, with light exploration for cold providers/models
complex prompts boost larger reasoning models and penalize mini/haiku-class models

Intent is detected by keyword matching on the latest user message. Complexity is estimated by word count.

API

GET /health — JSON with reachable providers, configured providers, and disabled providers
POST /v1/chat/completions — OpenAI-compatible; routes automatically

Notes

openai-codex is kept as an optional bridge, not a required first hop.
Anthropic compatibility is provided through Dario, so anthropic can stay in openclaw.json while routing locally through dario.
The repo systemd unit is template-style and expects local machine values in ~/.config/sage-router/sage-router.env.
Empirical latency memory is persisted at ~/.cache/sage-router/latency-stats.json by default.
When the OpenClaw gateway model-set path is unhealthy, the helper falls back to running without provider/model overrides instead of failing hard.
If any provider starts misbehaving, suppress it with SAGE_ROUTER_DISABLED_PROVIDERS instead of editing the router.
For reliable Umbrel/OpenClaw/Remnic use, point clients at http://sage-router:8788/v1 on umbrel_main_network, set unauthenticated Ollama auto-pull patterns to empty, and keep quota-bound providers disabled until credentials are healthy.
GitHub workflows now include CI syntax checks and CodeQL analysis for Python + JavaScript.
See BRANCH_PROTECTION.md for the exact required-check setup on GitHub.
provider-profiles.json includes a grok-sso template for the OpenClaw xAI auth plugin's local SuperGrok-backed proxy.

Install

Install the user service from the repo copy:

mkdir -p ~/.config/systemd/user ~/.config/sage-router
cp systemd/sage-router.service ~/.config/systemd/user/sage-router.service
cp systemd/sage-router.env.example ~/.config/sage-router/sage-router.env
# edit ~/.config/sage-router/sage-router.env for your machine
systemctl --user daemon-reload
systemctl --user enable --now sage-router.service

Notes:

the repo unit is now env-driven and does not hardcode your home path, Node version, or workspace location
set SAGE_ROUTER_HOME to the actual repo path on your machine
optionally set SAGE_ROUTER_PATH_PREFIX if your Python, Node, or Dario bins are not already on PATH

If an Anthropic provider is detected and Dario is not installed yet, install Dario first:

GitHub: https://github.com/askalf/dario

Service

systemctl --user status sage-router
systemctl --user restart sage-router
journalctl --user -u sage-router -f   # live logs

Docker production notes

Docker image includes Node, Python, Sage Router, and @askalf/dario.
Mount host Dario credentials as ~/.dario:/root/.dario for Anthropic-compatible Claude routing.
Enable llama.cpp classifier sidecar with docker compose --profile classifier up -d and SAGE_ROUTER_INTENT_CLASSIFIER_ENABLED=1.
Production classifier flags: SAGE_ROUTER_INTENT_CLASSIFIER_PROVIDER=llamacpp, SAGE_ROUTER_INTENT_CLASSIFIER_BASE_URL=http://llamacpp-classifier:8080, SAGE_ROUTER_INTENT_CLASSIFIER_MODEL=classifier.

Router profiles

Sage Router supports named routing profiles in router-profiles.json next to router.py.

Request a profile with any of:

model: "sage-router/<profile>"
model: "<profile>"
top-level profile, routerProfile, or sageRouterProfile

Profile fields currently supported:

route: fast, balanced, best, local-first, realtime
thinking: low, medium, high
capability/quality flags: requiresQuality, requiresReasoning, requiresTools, frontierLargeOnly, frontierOrReasoningTools, suppressIntermediateToolText, qualitySensitive, reasoning, tools, preferTools, json, vision, document, longContext
constraints: allowProviders, denyProviders, allowModels, denyModels, minParamsB

Current profiles:

frontier: default high-quality frontier routing profile. Forces best/high, reasoning, quality-sensitive, suppresses tool-call narration, and blocks tiny/free filler models.
frontier-large: strict frontier-large-only routing.
fast-local: low-latency local-first routing.
coding-max: high-thinking code route with weak model exclusions.

Sage Router

Install

Sage Router

Endpoints

Active Providers

Routing Logic

API

Notes

Install

Service

Docker production notes

Router profiles

Related skills