proxy-token-optimizer
Optimize LLM token usage and API costs for the openclaw-manager proxy platform. Provides model-tier routing (route simple prompts to glm-4.7-flashx instead o...
Like a lobster shell, security has layers — review code before you run it.
License
SKILL.md
Proxy Token Optimizer
Reduces LLM API costs for the openclaw-manager multi-tenant proxy platform through four strategies:
- Model-tier routing — Route prompts to the cheapest capable model
- Heartbeat optimization — Cheapest model + longer intervals for heartbeat calls
- Context lazy loading — Load only the context files each prompt actually needs
- Platform usage analytics — Real data from PostgreSQL, not estimates
Why these strategies matter
The openclaw-manager platform proxies LLM requests for multiple OpenClaw instances through providers like zai-proxy, zai-coding-proxy, and kimi-coding-proxy. Each provider offers models at different price points (e.g., glm-4.7 vs glm-4.7-flashx). Without optimization, every request — including simple greetings and heartbeat pings — uses the default (expensive) model, and every session loads the full context regardless of need. These four strategies target the highest-impact cost drivers.
Quick start
All instance-side scripts run locally with no dependencies. Platform-side scripts need DB access.
# Model routing — which model should handle this prompt?
python3 scripts/model_router.py "thanks!"
# → {"tier": "cheap", "recommended_model": "zai-proxy/glm-4.7-flashx"}
# Context optimization — which files does this prompt need?
python3 scripts/context_optimizer.py recommend "hi"
# → {"context_level": "minimal", "recommended_files": ["SOUL.md", "IDENTITY.md"]}
# Heartbeat config — generate openclaw.json patch
python3 scripts/heartbeat_config.py patch
# → {"agents": {"defaults": {"heartbeat": {"every": "55m", "model": "zai-proxy/glm-4.7-flashx"}}}}
# Unified CLI (all commands in one place)
python3 scripts/cli.py --help
Scripts reference
Instance-side (pure local, no network, no DB)
scripts/model_router.py
Routes prompts to the right model tier based on complexity analysis.
Tier logic:
- cheap →
glm-4.7-flashx: Greetings, acknowledgments, heartbeats, cron jobs, log parsing. Cost savings: 5-10x vs standard. - standard →
glm-4.7: Code writing, debugging, explanations. Default for unclear prompts. - premium →
glm-4.7(ork2p5for kimi): Architecture design, deep analysis, strategy planning.
Supports Chinese and English patterns. Provider-aware — works with zai-proxy, zai-coding-proxy, and kimi-coding-proxy.
python3 scripts/model_router.py "<prompt>" [provider]
python3 scripts/model_router.py compare # show all provider models
scripts/context_optimizer.py
Analyzes prompt complexity to recommend which context files to load, reducing unnecessary token consumption.
Context levels:
| Level | When | Files loaded | Token savings |
|---|---|---|---|
| minimal | "hi", "thanks", short msgs | SOUL.md + IDENTITY.md (2) | ~80% |
| standard | "write a function", normal work | + memory/TODAY.md + conditional | ~50% |
| full | "design architecture", complex tasks | + MEMORY.md + all conditional | ~30% |
Also generates an optimized AGENTS.md template with lazy-loading rules baked in:
python3 scripts/context_optimizer.py recommend "<prompt>"
python3 scripts/context_optimizer.py generate-agents # creates AGENTS.md.optimized
scripts/heartbeat_config.py
Generates openclaw.json configuration patches for heartbeat optimization:
- Forces heartbeat model to
glm-4.7-flashx(cheapest available) - Sets interval to 55 minutes (keeps prompt cache warm within 1-hour TTL, avoids cache rebuild cost)
python3 scripts/heartbeat_config.py recommend [cache_ttl_minutes]
python3 scripts/heartbeat_config.py patch # output JSON patch for openclaw.json
Platform-side (requires DB connection)
These scripts query the usage_records PostgreSQL table for real data. Run from the openclaw-manager project root with the virtualenv activated.
scripts/usage_report.py
Generates usage reports from actual database records — not estimates.
python3 scripts/usage_report.py overview [days] # platform-wide summary
python3 scripts/usage_report.py instance <name> [days] # single instance detail
Overview includes: total calls/tokens, per-provider breakdown, per-model breakdown, top 10 instances by consumption, 7-day daily trend.
Instance report includes: per-model distribution, daily trend, lifetime totals.
scripts/quota_advisor.py
Compares actual 24-hour usage against quota plan limits to find mismatches:
- Wasteful: Usage below 20% of plan limit → suggest downgrade
- Throttled: Usage above 80% of plan limit → suggest upgrade
python3 scripts/quota_advisor.py analyze # check all instances
python3 scripts/quota_advisor.py plans # show available quota plans
Unified CLI
scripts/cli.py wraps all the above into a single entry point:
python3 scripts/cli.py route "<prompt>" # model routing
python3 scripts/cli.py context "<prompt>" # context recommendation
python3 scripts/cli.py generate-agents # generate AGENTS.md
python3 scripts/cli.py heartbeat # heartbeat config
python3 scripts/cli.py overview [days] # platform usage (needs DB)
python3 scripts/cli.py report <name> [days] # instance report (needs DB)
python3 scripts/cli.py advisor # quota advice (needs DB)
Project integration points
This skill works with existing openclaw-manager infrastructure:
| Component | File | How this skill uses it |
|---|---|---|
| Provider config | config/model.yaml | Model names/endpoints for routing |
| Proxy routing | config_service.py | Where _inject_proxy_providers() registers models |
| Usage recording | proxy_common/usage_recorder.py | Source of real usage data |
| Quota plans | config/llm_proxy.yaml | Plan definitions for quota advisor |
| Instance model | app/models.py | Instance metadata for reports |
Expected savings
| Strategy | Mechanism | Impact |
|---|---|---|
| Context lazy loading | Fewer tokens per request | 50-80% context reduction |
| Model routing (flashx) | Lower per-token price | 5-10x on simple tasks |
| Heartbeat → flashx | Lower heartbeat cost | Significant per-instance savings |
| Heartbeat interval 55min | Fewer API calls | ~45% fewer heartbeat calls |
Files
8 totalComments
Loading comments…
