Model Routing Orchestrator

Route each user request to the most cost-effective model or multi-model workflow based on task type, complexity, risk, latency, budget, tool needs, and verification requirements.

stephensu66@stephensu66

Install

openclaw skills install @stephensu66/model-routing-orchestrator

Production Model Router

Overview

Use this skill to decide which model tier, workflow shape, and verification strategy should handle a user's request.

The goal is to maximize cost-effectiveness without sacrificing task fit, correctness, or operational reliability.

This skill does not blindly choose the strongest model. It chooses the cheapest safe path that still meets the quality bar for the task.

It may recommend:

a single low-cost model
a single balanced model
a single premium model
a tool-assisted model workflow
a staged multi-model pipeline
a parallel comparison workflow
a draft-and-review workflow
a consensus or verifier workflow

Primary objective

For every request, choose the minimum-cost execution path that can still satisfy:

task quality
correctness requirements
latency expectations
safety or risk constraints
output format needs
tool and modality requirements

When to use

Use this skill when you need to decide:

which model should answer a given user request
whether a cheap model is enough
when to escalate to a stronger reasoning model
when to use one model versus multiple models
when to use tools instead of relying on pure model reasoning
how to handle complex calculations, code, multimodal input, long context, or high-risk tasks
how to balance cost, speed, and answer quality in production

Do not use

Do not use this skill to:

answer the original business question directly
fabricate model capabilities without evidence from the environment or configuration
assume the most expensive model is always the best choice
route high-risk exact tasks to a cheap model without verification
rely on pure language generation for exact arithmetic when tools are available

Inputs to collect

Collect or infer the following from the request and system context:

Request characteristics

task type
domain
expected output type
presence of images, files, tables, code, or long documents
need for exactness versus approximate usefulness
whether the request is open-ended or precision-critical

Execution constraints

budget sensitivity
latency sensitivity
quality expectation
token or context size pressure
tool availability
need for citations or traceability
need for reproducibility

Risk profile

low-risk
medium-risk
high-risk

Failure tolerance

whether a rough answer is acceptable
whether the answer must be verified
whether disagreement between models would be valuable

Task taxonomy

Classify the request into one or more of these categories:

Simple generation
- rewrite
- summarization
- formatting
- light translation
- basic brainstorming
General reasoning
- explanation
- comparison
- concept mapping
- normal business analysis
Deep reasoning
- multi-step planning
- tradeoff analysis
- architecture design
- ambiguous decision support
- chain-dependent reasoning
Exact calculation or formal logic
- arithmetic
- financial calculations
- unit conversion
- spreadsheet-like reasoning
- symbolic or step-sensitive math
- combinatorics or logic puzzles where exactness matters
Coding and technical execution
- code generation
- debugging
- refactoring
- test generation
- query writing
- infrastructure or API design
Long-context synthesis
- large documents
- multiple files
- multi-source comparison
- transcript or contract review
Multi-modal tasks
- image understanding
- diagram interpretation
- PDF with layout-heavy content
- video or audio related tasks if supported
High-risk tasks
- medical
- legal
- financial decisions
- compliance
- security-sensitive operations
- anything where incorrect advice has material consequences

Core routing principle

Always prefer the cheapest path that can safely succeed.

Apply this order of preference:

Cheap single-model path
Balanced single-model path
Premium single-model path
Tool-assisted path
Staged multi-model path
Parallel multi-model comparison
Premium plus verifier or consensus workflow

Do not escalate unless the task characteristics justify it.

Model tiers

Use abstract capability tiers unless the deployment specifies exact providers.

Economy tier

Use for:

simple rewriting
formatting
low-risk classification
short summaries
lightweight extraction
first-pass triage

Strengths:

lowest cost
fast response
good for straightforward tasks

Weaknesses:

weaker deep reasoning
more brittle on ambiguity
worse on exactness-critical tasks

Balanced tier

Use for:

everyday product and engineering work
standard reasoning
moderate code tasks
moderate document analysis
most business and writing tasks

Strengths:

solid quality-cost tradeoff
handles most normal production traffic
reasonable speed and robustness

Weaknesses:

may still fail on highly ambiguous or exacting tasks
not always enough for hard reasoning or high-risk requests

Premium tier

Use for:

deep reasoning
difficult code and architecture problems
long-context synthesis with subtle dependencies
high-value outputs
high-risk tasks requiring stronger judgment

Strengths:

strongest reasoning
better ambiguity handling
better synthesis quality

Weaknesses:

highest cost
often slower
overkill for simple tasks

Tool-assisted tier

Use when exactness matters more than fluent wording.

Use this path for:

arithmetic
deterministic calculations
spreadsheet operations
formula application
structured data transformation
exact code execution or testing if available
retrieval-backed factual tasks

Rule: When a task requires exact numeric correctness, prefer tools plus model orchestration over pure model reasoning.

Decision dimensions

Score the request across these dimensions:

1. Complexity

low
medium
high
very high

2. Exactness requirement

low: approximate answer is acceptable
medium: mostly correct is acceptable
high: exact result expected
critical: exact result plus verification required

3. Risk level

low
medium
high

4. Latency priority

urgent
normal
relaxed

5. Budget strategy

minimize cost
balanced
quality-first

6. Context burden

short
moderate
long
extreme

7. Modality burden

text only
image or PDF
mixed inputs

Hard routing rules

Apply these rules before any soft optimization.

Exact calculation rule

If the task involves exact arithmetic, formulas, tables, accounting-like operations, unit-sensitive conversions, or step-sensitive logic:

do not rely on a pure language-only route when tools are available
prefer tool-assisted execution
use a balanced or premium model only to interpret the task and explain results
add a verification step for high-impact numeric outputs

High-risk rule

If the task is high-risk:

do not use economy-only routing as the final path
require either premium single-model reasoning with grounding or a model plus verifier workflow
add citations, checks, or a review pass when possible

Ambiguity rule

If the task is materially ambiguous and the answer quality depends on interpretation:

use a stronger reasoning tier or a two-stage workflow
do not finalize on a cheap first-pass answer without clarification or review

Long-context rule

If the input is large or multi-document:

prefer staged processing
use extraction or chunk summarization first
then use a stronger model for synthesis if needed
avoid sending everything to the strongest model by default if staged reduction is cheaper and safe

Multimodal rule

If the task includes images, diagrams, PDFs with layout dependence, or visual interpretation:

use a model path that actually supports the required modality
do not route to a text-only path

Coding rule

For code tasks:

simple boilerplate or syntax transforms may use balanced or economy tiers
debugging, architecture, concurrency, performance, or tricky refactors should escalate to balanced or premium tiers
if execution, linting, tests, or static analysis tools are available, prefer tool-assisted validation

Recommended workflows

Choose one of these workflow shapes.

1. Single economy

Use when:

low complexity
low risk
low exactness requirement
low business impact
latency and cost matter more than polish

Examples:

rewrite text
generate short summaries
classify intent
format content

2. Single balanced

Use when:

the task is typical production traffic
moderate reasoning is needed
quality matters but premium is not justified

Examples:

standard technical Q&A
ordinary product copy
moderate coding tasks
document understanding with limited ambiguity

3. Single premium

Use when:

the task needs strong reasoning
the output is strategically important
ambiguity is high
long dependency chains matter

Examples:

system design
complex debugging
nuanced tradeoff analysis
sensitive writing requiring higher judgment

4. Tool-assisted reasoning

Use when:

exactness matters
calculations are required
data must be transformed reliably
code can be executed or checked
retrieval is needed for factual grounding

Pattern:

model interprets the request
tools compute, retrieve, or validate
model explains and formats the result

5. Staged pipeline

Use when:

the request is large, expensive, or decomposable
cheap preprocessing can reduce downstream cost

Pattern:

economy or balanced model for triage or extraction
balanced or premium model for synthesis
optional verifier pass

Examples:

long-document analysis
large support threads
multi-file engineering review

6. Draft and review

Use when:

low-cost drafting is possible but final quality matters

Pattern:

cheaper model drafts
stronger model critiques, corrects, or upgrades

Best for:

writing
technical explanations
proposal drafting
code review style tasks

7. Parallel comparison

Use when:

model disagreement is informative
solution diversity is valuable
the task is comparative or open-ended

Pattern:

two models produce independent answers
a stronger model or rule layer compares and merges

Best for:

architecture options
planning alternatives
ambiguous recommendations

8. Consensus or verifier workflow

Use when:

correctness matters enough to justify extra cost
false confidence is dangerous

Pattern:

primary model produces answer
verifier model checks logic, calculations, or policy fit
disagreements trigger escalation or explicit uncertainty

Best for:

high-risk reasoning
important financial outputs
compliance-sensitive content
high-value technical decisions

Cost-control strategy

Use these strategies to keep cost high-value.

Default strategy

start cheap when safe
escalate only on signals of failure risk
avoid premium for routine tasks
reuse extracted structure instead of repeating full-context calls

Escalation triggers

Escalate to a stronger model or multi-step workflow when any of these appear:

multiple dependent reasoning steps
ambiguous user intent with multiple plausible interpretations
repeated self-contradiction in draft output
failure to follow structure or constraints
long context with subtle dependencies
code correctness matters beyond surface syntax
exactness-critical math or finance output
high-risk domain or high business impact

De-escalation triggers

Use a cheaper path when:

the task is mostly formatting or rewriting
the answer can be approximate
the task is repetitive and pattern-based
first-pass triage is enough
premium capabilities would not materially improve the outcome

Complex calculation policy

When the request includes complex calculations or formal reasoning:

Separate interpretation from computation.
Use the model to parse the problem and define the method.
Use a deterministic tool or calculational path when available.
Ask a verifier layer to check assumptions, formulas, units, and edge cases for high-impact outputs.
Present the final answer with explicit assumptions and, when relevant, step order.

Never use a fluent but non-verified freeform model answer as the final authority for exact numeric work when a deterministic path exists.

Long-context policy

When the request includes large context:

first extract relevant segments, summaries, or structured facts
reduce duplication
preserve citations or pointers when possible
synthesize only after reduction
use premium synthesis only if the reduced problem still demands it

Output format

Return exactly this structure:

Primary Reason:

Task Profile:

taskType:
complexity: <low | medium | high | very high>
exactness: <low | medium | high | critical>
risk: <low | medium | high>
latency: <urgent | normal | relaxed>
budget: <minimize-cost | balanced | quality-first>
contextLoad: <short | moderate | long | extreme>
modality: <text | multimodal>

Recommended Execution Plan:

<step 1>
<step 2>
<step 3>
<step 4 if needed>

Model Role Assignment:

planner: <economy | balanced | premium | not-needed>
generator: <economy | balanced | premium | tool-assisted>
verifier: <none | balanced | premium | tool>

Why This Is Cost-Effective:

<reason 1>
<reason 2>
<reason 3>

Escalation Rule:

Fallback Rule:

Examples

Example 1: Simple rewrite

Request: "Rewrite this email to sound more professional."

Output: Routing Decision: single-economy

Primary Reason: This is a low-risk rewriting task where a low-cost model should meet the quality bar.

Task Profile:

taskType: simple generation
complexity: low
exactness: low
risk: low
latency: urgent
budget: minimize-cost
contextLoad: short
modality: text

Recommended Execution Plan:

Use an economy model to rewrite the email.
Preserve intent and tone constraints.
Return one polished version and one shorter variant.

Model Role Assignment:

planner: not-needed
generator: economy
verifier: none

Why This Is Cost-Effective:

The task is structurally simple.
Premium reasoning is unnecessary.
Fast low-cost generation is sufficient.

Escalation Rule: Escalate only if the user requests highly nuanced persuasion, legal sensitivity, or multiple tone variants.

Fallback Rule: If the first rewrite is too generic, retry once with stricter tone instructions using a balanced model.

Example 2: Complex system design

Request: "Design an AI agent architecture for long-form writing with memory, retrieval, conflict resolution, and chapter planning."

Output: Routing Decision: single-premium

Primary Reason: This task requires deep reasoning, abstraction, and multi-part architecture tradeoff analysis.

Task Profile:

taskType: deep reasoning
complexity: high
exactness: medium
risk: medium
latency: normal
budget: balanced
contextLoad: moderate
modality: text

Recommended Execution Plan:

Use a premium model to identify major architecture components and constraints.
Structure the answer into memory layers, retrieval flow, consistency controls, and orchestration.
Include tradeoffs and failure modes.
Add implementation recommendations only after the design is coherent.

Model Role Assignment:

planner: premium
generator: premium
verifier: none

Why This Is Cost-Effective:

A weaker model may produce shallow or inconsistent architecture.
One premium pass is cheaper than multiple failed lower-tier retries.
The task value justifies stronger reasoning.

Escalation Rule: Escalate to parallel-comparison only if multiple competing architecture strategies are explicitly required.

Fallback Rule: If the first answer is too generic, rerun with a staged outline-first then full-design workflow.

Example 3: Financial calculation with policy impact

Request: "Calculate the projected monthly burn and runway under three hiring scenarios and explain the assumptions."

Output: Routing Decision: tool-assisted

Primary Reason: This request depends on exact calculations, so deterministic computation is safer and more cost-effective than pure model reasoning.

Task Profile:

taskType: exact calculation or formal logic
complexity: high
exactness: critical
risk: high
latency: normal
budget: balanced
contextLoad: moderate
modality: text

Recommended Execution Plan:

Use a balanced model to extract the variables, hiring scenarios, and required formulas.
Use a deterministic calculator or spreadsheet-capable path to compute burn and runway.
Run a verification check on formulas, assumptions, and units.
Use the model to present the results clearly with assumptions and scenario breakdowns.

Model Role Assignment:

planner: balanced
generator: tool-assisted
verifier: tool

Why This Is Cost-Effective:

Exact arithmetic should not depend on freeform model reasoning.
Deterministic calculation reduces error risk.
A smaller model can still explain the results after computation.

Escalation Rule: Escalate to premium review only if the scenario assumptions are ambiguous or strategically sensitive.

Fallback Rule: If tool-based computation is unavailable, return assumptions explicitly and mark the result as unverified instead of presenting certainty.

Example 4: Multi-document review

Request: "Compare these five candidate proposals and recommend the best one based on cost, timeline, technical fit, and risk."

Output: Routing Decision: staged-pipeline

Primary Reason: The request is large and decomposable, so staged extraction and synthesis is more cost-effective than sending everything directly to a premium model.

Task Profile:

taskType: long-context synthesis
complexity: high
exactness: medium
risk: medium
latency: relaxed
budget: balanced
contextLoad: long
modality: text

Recommended Execution Plan:

Use an economy or balanced model to extract structured facts from each proposal.
Normalize the proposals into a common comparison table.
Use a stronger model to synthesize tradeoffs and recommend the best option.
Add a brief verifier pass if the recommendation is high stakes.

Model Role Assignment:

planner: balanced
generator: staged-pipeline
verifier: balanced

Why This Is Cost-Effective:

Cheap extraction lowers total token cost.
Structured normalization improves synthesis quality.
Premium reasoning is reserved for the part that truly needs it.

Escalation Rule: Escalate to consensus-check if the recommendation will drive a major decision or if proposal differences are subtle.

Fallback Rule: If extraction quality is poor, rerun the extraction stage with a stronger model before recomputing the final recommendation.

Model Routing Orchestrator

Install

Production Model Router

Overview

Primary objective

When to use

Do not use

Inputs to collect

Request characteristics

Execution constraints

Risk profile

Failure tolerance

Task taxonomy

Core routing principle

Model tiers

Economy tier

Balanced tier

Premium tier

Tool-assisted tier

Decision dimensions

1. Complexity

2. Exactness requirement

3. Risk level

4. Latency priority

5. Budget strategy

6. Context burden

7. Modality burden

Hard routing rules

Exact calculation rule

High-risk rule

Ambiguity rule

Long-context rule

Multimodal rule

Coding rule

Recommended workflows

1. Single economy

2. Single balanced

3. Single premium

4. Tool-assisted reasoning

5. Staged pipeline

6. Draft and review

7. Parallel comparison

8. Consensus or verifier workflow

Cost-control strategy

Default strategy

Escalation triggers

De-escalation triggers

Complex calculation policy

Long-context policy

Output format

Examples

Example 1: Simple rewrite

Example 2: Complex system design

Example 3: Financial calculation with policy impact

Example 4: Multi-document review

Related skills