OpenClaw Token Optimizer

Reduce OpenClaw token usage and API costs through smart model routing, heartbeat optimization, budget tracking, and native 2026.2.15 features (session prunin...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
22 · 6k · 41 current installs · 45 all-time installs
byAsif@Asif2BD
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (token optimization via context/model/heartbeat/budget) match the included scripts and docs. Optional docs reference multi-provider setups and example API key placeholders, but the executable scripts do not require or use external network calls by default.
Instruction Scope
Runtime instructions and scripts operate on the OpenClaw workspace (reading/writing ~/.openclaw/workspace and memory JSON files) which is expected for this purpose. Minor discrepancy: SKILL.md and SECURITY.md emphasize "no system modifications" and "no subprocess calls," yet the README and SKILL.md show explicit user actions that write files (cp to ~/.openclaw/workspace/HEARTBEAT.md) and the bundle includes a shell wrapper (scripts/optimize.sh) that launches the bundled Python scripts. These are local, user-invoked operations and not networked, but the SKILL.md phrasing is slightly over-broad.
Install Mechanism
No install spec (instruction-only) and code is bundled in the package. No downloads or external installers are run automatically. The scripts use only Python standard library and a small shell wrapper; .clawhubsafe provides SHA256 checksums for integrity.
Credentials
Registry declares no required env vars or credentials. Some scripts (model_router) will optionally detect provider API keys from environment to choose a default provider, but they do not send network requests. Documentation references API key placeholders only for optional multi-provider workflows — expected and proportionate.
Persistence & Privilege
Skill is not always-enabled and uses no elevated privileges. It writes and reads files only in the user's OpenClaw workspace (~/.openclaw/workspace), which is reasonable for configuration/state helpers. It does not modify other skills or system-wide settings autonomously.
Assessment
This skill appears to be what it claims: local Python utilities that read/write files under your OpenClaw workspace to recommend and apply token-saving patterns. Before installing: (1) Inspect the bundled scripts yourself (they're small and use only the Python stdlib); (2) verify the .clawhubsafe checksums after download; (3) be aware that running the optional optimize.sh or following the README's cp command will write files into ~/.openclaw/workspace (this is expected behavior for applying heartbeat templates and storing state); and (4) documentation mentions multi-provider setups and API key placeholders — those are optional and not used automatically. If you need absolute assurance, run the Python scripts directly instead of the shell wrapper and review the workspace files the scripts create.

Like a lobster shell, security has layers — review code before you run it.

Current versionv3.0.0
Download zip
cost-savingsvk97fkdgfgprqb9h97d6xasx6m9821721latestvk97fkdgfgprqb9h97d6xasx6m9821721lazy-loadingvk97fkdgfgprqb9h97d6xasx6m9821721model-routingvk97fkdgfgprqb9h97d6xasx6m9821721productivityvk97fkdgfgprqb9h97d6xasx6m9821721token-optimizationvk97fkdgfgprqb9h97d6xasx6m9821721

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

Token Optimizer

Comprehensive toolkit for reducing token usage and API costs in OpenClaw deployments. Combines smart model routing, optimized heartbeat intervals, usage tracking, and multi-provider strategies.

Quick Start

Immediate actions (no config changes needed):

  1. Generate optimized AGENTS.md (BIGGEST WIN!):

    python3 scripts/context_optimizer.py generate-agents
    # Creates AGENTS.md.optimized — review and replace your current AGENTS.md
    
  2. Check what context you ACTUALLY need:

    python3 scripts/context_optimizer.py recommend "hi, how are you?"
    # Shows: Only 2 files needed (not 50+!)
    
  3. Install optimized heartbeat:

    cp assets/HEARTBEAT.template.md ~/.openclaw/workspace/HEARTBEAT.md
    
  4. Enforce cheaper models for casual chat:

    python3 scripts/model_router.py "thanks!"
    # Single-provider Anthropic setup: Use Sonnet, not Opus
    # Multi-provider setup (OpenRouter/Together): Use Haiku for max savings
    
  5. Check current token budget:

    python3 scripts/token_tracker.py check
    

Expected savings: 50-80% reduction in token costs for typical workloads (context optimization is the biggest factor!).

Core Capabilities

0. Lazy Skill Loading (NEW in v3.0 — BIGGEST WIN!)

The single highest-impact optimization available. Most agents burn 3,000–15,000 tokens per session loading skill files they never use. Stop that first.

The pattern:

  1. Create a lightweight SKILLS.md catalog in your workspace (~300 tokens — list of skills + when to load them)
  2. Only load individual SKILL.md files when a task actually needs them
  3. Apply the same logic to memory files — load MEMORY.md at startup, daily logs only on demand

Token savings:

Library sizeBefore (eager)After (lazy)Savings
5 skills~3,000 tokens~600 tokens80%
10 skills~6,500 tokens~750 tokens88%
20 skills~13,000 tokens~900 tokens93%

Quick implementation in AGENTS.md:

## Skills

At session start: Read SKILLS.md (the index only — ~300 tokens).
Load individual skill files ONLY when a task requires them.
Never load all skills upfront.

Full implementation (with catalog template + optimizer script):

clawhub install openclaw-skill-lazy-loader

The companion skill openclaw-skill-lazy-loader includes a SKILLS.md.template, an AGENTS.md.template lazy-loading section, and a context_optimizer.py CLI that recommends exactly which skills to load for any given task.

Lazy loading handles context loading costs. The remaining capabilities below handle runtime costs. Together they cover the full token lifecycle.


1. Context Optimization (NEW!)

Biggest token saver — Only load files you actually need, not everything upfront.

Problem: Default OpenClaw loads ALL context files every session:

  • SOUL.md, AGENTS.md, USER.md, TOOLS.md, MEMORY.md
  • docs/**/*.md (hundreds of files)
  • memory/2026-*.md (daily logs)
  • Total: Often 50K+ tokens before user even speaks!

Solution: Lazy loading based on prompt complexity.

Usage:

python3 scripts/context_optimizer.py recommend "<user prompt>"

Examples:

# Simple greeting → minimal context (2 files only!)
context_optimizer.py recommend "hi"
→ Load: SOUL.md, IDENTITY.md
→ Skip: Everything else
→ Savings: ~80% of context

# Standard work → selective loading
context_optimizer.py recommend "write a function"
→ Load: SOUL.md, IDENTITY.md, memory/TODAY.md
→ Skip: docs, old memory, knowledge base
→ Savings: ~50% of context

# Complex task → full context
context_optimizer.py recommend "analyze our entire architecture"
→ Load: SOUL.md, IDENTITY.md, MEMORY.md, memory/TODAY+YESTERDAY.md
→ Conditionally load: Relevant docs only
→ Savings: ~30% of context

Output format:

{
  "complexity": "simple",
  "context_level": "minimal",
  "recommended_files": ["SOUL.md", "IDENTITY.md"],
  "file_count": 2,
  "savings_percent": 80,
  "skip_patterns": ["docs/**/*.md", "memory/20*.md"]
}

Integration pattern: Before loading context for a new session:

from context_optimizer import recommend_context_bundle

user_prompt = "thanks for your help"
recommendation = recommend_context_bundle(user_prompt)

if recommendation["context_level"] == "minimal":
    # Load only SOUL.md + IDENTITY.md
    # Skip everything else
    # Save ~80% tokens!

Generate optimized AGENTS.md:

context_optimizer.py generate-agents
# Creates AGENTS.md.optimized with lazy loading instructions
# Review and replace your current AGENTS.md

Expected savings: 50-80% reduction in context tokens.

2. Smart Model Routing (ENHANCED!)

Automatically classify tasks and route to appropriate model tiers.

NEW: Communication pattern enforcement — Never waste Opus tokens on "hi" or "thanks"!

Usage:

python3 scripts/model_router.py "<user prompt>" [current_model] [force_tier]

Examples:

# Communication (NEW!) → ALWAYS Haiku
python3 scripts/model_router.py "thanks!"
python3 scripts/model_router.py "hi"
python3 scripts/model_router.py "ok got it"
→ Enforced: Haiku (NEVER Sonnet/Opus for casual chat)

# Simple task → suggests Haiku
python3 scripts/model_router.py "read the log file"

# Medium task → suggests Sonnet
python3 scripts/model_router.py "write a function to parse JSON"

# Complex task → suggests Opus
python3 scripts/model_router.py "design a microservices architecture"

Patterns enforced to Haiku (NEVER Sonnet/Opus):

Communication:

  • Greetings: hi, hey, hello, yo
  • Thanks: thanks, thank you, thx
  • Acknowledgments: ok, sure, got it, understood
  • Short responses: yes, no, yep, nope
  • Single words or very short phrases

Background tasks:

  • Heartbeat checks: "check email", "monitor servers"
  • Cronjobs: "scheduled task", "periodic check", "reminder"
  • Document parsing: "parse CSV", "extract data from log", "read JSON"
  • Log scanning: "scan error logs", "process logs"

Integration pattern:

from model_router import route_task

user_prompt = "show me the config"
routing = route_task(user_prompt)

if routing["should_switch"]:
    # Use routing["recommended_model"]
    # Save routing["cost_savings_percent"]

Customization: Edit ROUTING_RULES or COMMUNICATION_PATTERNS in scripts/model_router.py to adjust patterns and keywords.

3. Heartbeat Optimization

Reduce API calls from heartbeat polling with smart interval tracking:

Setup:

# Copy template to workspace
cp assets/HEARTBEAT.template.md ~/.openclaw/workspace/HEARTBEAT.md

# Plan which checks should run
python3 scripts/heartbeat_optimizer.py plan

Commands:

# Check if specific type should run now
heartbeat_optimizer.py check email
heartbeat_optimizer.py check calendar

# Record that a check was performed
heartbeat_optimizer.py record email

# Update check interval (seconds)
heartbeat_optimizer.py interval email 7200  # 2 hours

# Reset state
heartbeat_optimizer.py reset

How it works:

  • Tracks last check time for each type (email, calendar, weather, etc.)
  • Enforces minimum intervals before re-checking
  • Respects quiet hours (23:00-08:00) — skips all checks
  • Returns HEARTBEAT_OK when nothing needs attention (saves tokens)

Default intervals:

  • Email: 60 minutes
  • Calendar: 2 hours
  • Weather: 4 hours
  • Social: 2 hours
  • Monitoring: 30 minutes

Integration in HEARTBEAT.md:

## Email Check
Run only if: `heartbeat_optimizer.py check email` → `should_check: true`
After checking: `heartbeat_optimizer.py record email`

Expected savings: 50% reduction in heartbeat API calls.

Model enforcement: Heartbeat should ALWAYS use Haiku — see updated HEARTBEAT.template.md for model override instructions.

4. Cronjob Optimization (NEW!)

Problem: Cronjobs often default to expensive models (Sonnet/Opus) even for routine tasks.

Solution: Always specify Haiku for 90% of scheduled tasks.

See: assets/cronjob-model-guide.md for comprehensive guide with examples.

Quick reference:

Task TypeModelExample
Monitoring/alertsHaikuCheck server health, disk space
Data parsingHaikuExtract CSV/JSON/logs
RemindersHaikuDaily standup, backup reminders
Simple reportsHaikuStatus summaries
Content generationSonnetBlog summaries (quality matters)
Deep analysisSonnetWeekly insights
Complex reasoningNever use Opus for cronjobs

Example (good):

# Parse daily logs with Haiku
cron add --schedule "0 2 * * *" \
  --payload '{
    "kind":"agentTurn",
    "message":"Parse yesterday error logs and summarize",
    "model":"anthropic/claude-haiku-4"
  }' \
  --sessionTarget isolated

Example (bad):

# ❌ Using Opus for simple check (60x more expensive!)
cron add --schedule "*/15 * * * *" \
  --payload '{
    "kind":"agentTurn",
    "message":"Check email",
    "model":"anthropic/claude-opus-4"
  }' \
  --sessionTarget isolated

Savings: Using Haiku instead of Opus for 10 daily cronjobs = $17.70/month saved per agent.

Integration with model_router:

# Test if your cronjob should use Haiku
model_router.py "parse daily error logs"
# → Output: Haiku (background task pattern detected)

5. Token Budget Tracking

Monitor usage and alert when approaching limits:

Setup:

# Check current daily usage
python3 scripts/token_tracker.py check

# Get model suggestions
python3 scripts/token_tracker.py suggest general

# Reset daily tracking
python3 scripts/token_tracker.py reset

Output format:

{
  "date": "2026-02-06",
  "cost": 2.50,
  "tokens": 50000,
  "limit": 5.00,
  "percent_used": 50,
  "status": "ok",
  "alert": null
}

Status levels:

  • ok: Below 80% of daily limit
  • warning: 80-99% of daily limit
  • exceeded: Over daily limit

Integration pattern: Before starting expensive operations, check budget:

import json
import subprocess

result = subprocess.run(
    ["python3", "scripts/token_tracker.py", "check"],
    capture_output=True, text=True
)
budget = json.loads(result.stdout)

if budget["status"] == "exceeded":
    # Switch to cheaper model or defer non-urgent work
    use_model = "anthropic/claude-haiku-4"
elif budget["status"] == "warning":
    # Use balanced model
    use_model = "anthropic/claude-sonnet-4-5"

Customization: Edit daily_limit_usd and warn_threshold parameters in function calls.

6. Multi-Provider Strategy

See references/PROVIDERS.md for comprehensive guide on:

  • Alternative providers (OpenRouter, Together.ai, Google AI Studio)
  • Cost comparison tables
  • Routing strategies by task complexity
  • Fallback chains for rate-limited scenarios
  • API key management

Quick reference:

ProviderModelCost/MTokUse Case
AnthropicHaiku 4$0.25Simple tasks
AnthropicSonnet 4.5$3.00Balanced default
AnthropicOpus 4$15.00Complex reasoning
OpenRouterGemini 2.5 Flash$0.075Bulk operations
Google AIGemini 2.0 Flash ExpFREEDev/testing
TogetherLlama 3.3 70B$0.18Open alternative

Configuration Patches

See assets/config-patches.json for advanced optimizations:

Implemented by this skill:

  • ✅ Heartbeat optimization (fully functional)
  • ✅ Token budget tracking (fully functional)
  • ✅ Model routing logic (fully functional)

Native OpenClaw 2026.2.15 — apply directly:

  • ✅ Session pruning (contextPruning: cache-ttl) — auto-trims old tool results after Anthropic cache TTL expires
  • ✅ Bootstrap size limits (bootstrapMaxChars / bootstrapTotalMaxChars) — caps workspace file injection size
  • ✅ Cache retention long (cacheRetention: "long" for Opus) — amortizes cache write costs

Requires OpenClaw core support:

  • ⏳ Prompt caching (Anthropic API feature — verify current status)
  • ⏳ Lazy context loading (use context_optimizer.py script today)
  • ⏳ Multi-provider fallback (partially supported)

Apply config patches:

# Example: Enable multi-provider fallback
gateway config.patch --patch '{"providers": [...]}'

Native OpenClaw Diagnostics (2026.2.15+)

OpenClaw 2026.2.15 added built-in commands that complement this skill's Python scripts. Use these first for quick diagnostics before reaching for the scripts.

Context breakdown

/context list    → token count per injected file (shows exactly what's eating your prompt)
/context detail  → full breakdown including tools, skills, and system prompt sections

Use before applying bootstrap_size_limits — see which files are oversized, then set bootstrapMaxChars accordingly.

Per-response usage tracking

/usage tokens    → append token count to every reply
/usage full      → append tokens + cost estimate to every reply
/usage cost      → show cumulative cost summary from session logs
/usage off       → disable usage footer

Combine with token_tracker.py/usage cost gives session totals; token_tracker.py tracks daily budget.

Session status

/status          → model, context %, last response tokens, estimated cost

Cache TTL Heartbeat Alignment (NEW in v1.4.0)

The problem: Anthropic charges ~3.75x more for cache writes than cache reads. If your agent goes idle and the 1h cache TTL expires, the next request re-writes the entire prompt cache — expensive.

The fix: Set heartbeat interval to 55min (just under the 1h TTL). The heartbeat keeps the cache warm, so every subsequent request pays cache-read rates instead.

# Get optimal interval for your cache TTL
python3 scripts/heartbeat_optimizer.py cache-ttl
# → recommended_interval: 55min (3300s)
# → explanation: keeps 1h Anthropic cache warm

# Custom TTL (e.g., if you've configured 2h cache)
python3 scripts/heartbeat_optimizer.py cache-ttl 7200
# → recommended_interval: 115min

Apply to your OpenClaw config:

{
  "agents": {
    "defaults": {
      "heartbeat": {
        "every": "55m"
      }
    }
  }
}

Who benefits: Anthropic API key users only. OAuth profiles already default to 1h heartbeat (OpenClaw smart default). API key profiles default to 30min — bumping to 55min is both cheaper (fewer calls) and cache-warm.


Deployment Patterns

For Personal Use

  1. Install optimized HEARTBEAT.md
  2. Run budget checks before expensive operations
  3. Manually route complex tasks to Opus only when needed

Expected savings: 20-30%

For Managed Hosting (xCloud, etc.)

  1. Default all agents to Haiku
  2. Route user interactions to Sonnet
  3. Reserve Opus for explicitly complex requests
  4. Use Gemini Flash for background operations
  5. Implement daily budget caps per customer

Expected savings: 40-60%

For High-Volume Deployments

  1. Use multi-provider fallback (OpenRouter + Together.ai)
  2. Implement aggressive routing (80% Gemini, 15% Haiku, 5% Sonnet)
  3. Deploy local Ollama for offline/cheap operations
  4. Batch heartbeat checks (every 2-4 hours, not 30 min)

Expected savings: 70-90%

Integration Examples

Workflow: Smart Task Handling

# 1. User sends message
user_msg="debug this error in the logs"

# 2. Route to appropriate model
routing=$(python3 scripts/model_router.py "$user_msg")
model=$(echo $routing | jq -r .recommended_model)

# 3. Check budget before proceeding
budget=$(python3 scripts/token_tracker.py check)
status=$(echo $budget | jq -r .status)

if [ "$status" = "exceeded" ]; then
    # Use cheapest model regardless of routing
    model="anthropic/claude-haiku-4"
fi

# 4. Process with selected model
# (OpenClaw handles this via config or override)

Workflow: Optimized Heartbeat

## HEARTBEAT.md

# Plan what to check
result=$(python3 scripts/heartbeat_optimizer.py plan)
should_run=$(echo $result | jq -r .should_run)

if [ "$should_run" = "false" ]; then
    echo "HEARTBEAT_OK"
    exit 0
fi

# Run only planned checks
planned=$(echo $result | jq -r '.planned[].type')

for check in $planned; do
    case $check in
        email) check_email ;;
        calendar) check_calendar ;;
    esac
    python3 scripts/heartbeat_optimizer.py record $check
done

Troubleshooting

Issue: Scripts fail with "module not found"

  • Fix: Ensure Python 3.7+ is installed. Scripts use only stdlib.

Issue: State files not persisting

  • Fix: Check that ~/.openclaw/workspace/memory/ directory exists and is writable.

Issue: Budget tracking shows $0.00

  • Fix: token_tracker.py needs integration with OpenClaw's session_status tool. Currently tracks manually recorded usage.

Issue: Routing suggests wrong model tier

  • Fix: Customize ROUTING_RULES in model_router.py for your specific patterns.

Maintenance

Daily:

  • Check budget status: token_tracker.py check

Weekly:

  • Review routing accuracy (are suggestions correct?)
  • Adjust heartbeat intervals based on activity

Monthly:

  • Compare costs before/after optimization
  • Review and update PROVIDERS.md with new options

Cost Estimation

Example: 100K tokens/day workload

Without skill:

  • 50K context tokens + 50K conversation tokens = 100K total
  • All Sonnet: 100K × $3/MTok = $0.30/day = $9/month
StrategyContextModelDaily CostMonthlySavings
Baseline (no optimization)50KSonnet$0.30$9.000%
Context opt only10K (-80%)Sonnet$0.18$5.4040%
Model routing only50KMixed$0.18$5.4040%
Both (this skill)10KMixed$0.09$2.7070%
Aggressive + Gemini10KGemini$0.03$0.9090%

Key insight: Context optimization (50K → 10K tokens) saves MORE than model routing!

xCloud hosting scenario (100 customers, 50K tokens/customer/day):

  • Baseline (all Sonnet, full context): $450/month
  • With token-optimizer: $135/month
  • Savings: $315/month per 100 customers (70%)

Resources

Scripts (4 total)

  • context_optimizer.py — Context loading optimization and lazy loading (NEW!)
  • model_router.py — Task classification, model suggestions, and communication enforcement (ENHANCED!)
  • heartbeat_optimizer.py — Interval management and check scheduling
  • token_tracker.py — Budget monitoring and alerts

References

  • PROVIDERS.md — Alternative AI providers, pricing, and routing strategies

Assets (3 total)

  • HEARTBEAT.template.md — Drop-in optimized heartbeat template with Haiku enforcement (ENHANCED!)
  • cronjob-model-guide.md — Complete guide for choosing models in cronjobs (NEW!)
  • config-patches.json — Advanced configuration examples

Future Enhancements

Ideas for extending this skill:

  1. Auto-routing integration — Hook into OpenClaw message pipeline
  2. Real-time usage tracking — Parse session_status automatically
  3. Cost forecasting — Predict monthly spend based on recent usage
  4. Provider health monitoring — Track API latency and failures
  5. A/B testing — Compare quality across different routing strategies

Files

10 total
Select a file
Select a file to preview.

Comments

Loading comments…