Install
openclaw skills install tsaverFive-phase token audit framework for OpenClaw: Discover → Prioritize (3D matrix) → Optimize (8 category techniques) → Validate → Monitor. Universal; adapt via appendix. Trigger: "省点 token", "token 优化", "token saver", "token audit", "检查 token 消耗"
openclaw skills install tsaverUniversal token audit & optimization framework for OpenClaw agents. Based on real-world practice (2026-05-04).
After each full execution, write a report (token-audit-report-YYYY-MM-DD.md)
containing: before/after comparison table, estimated weekly savings per change,
items deferred and why, recommended next step.
Read your cron/scheduled task configuration (e.g. ~/.openclaw/cron/jobs.json).
For each task record:
namemodel (or "default" if unset)message / prompt length in charsschedule frequency (daily / weekly / other)delivery.mode (announce / none)sessionTarget (isolated / main)Inspect your gateway config (e.g. openclaw.json):
agents.defaults.heartbeat.* — interval, active hours, isolated session,
light context flagagents.defaults.compaction.mode — message retention aggressivenessagents.list[].tools.profile — full, coding, or customagents.list[].model — per-agent model overrideList every file that is injected at session start (typically files in the workspace root directory). Measure each in chars and estimate token cost (~3 chars per token for CJK-heavy text, ~4 for English-heavy).
If LCM (Lossless Context Management) is active, note the number and average size of compacted summary blocks injected per turn.
If tool schemas are accessible, estimate total schema chars: (count of registered tools × average schema size in chars).
Categorize all available models into three tiers based on capability and cost:
Map each task from 1A to its current model tier.
⚠️ Checkpoint: Before moving to Phase 2, present your Phase 1 findings (task inventory, file sizes, model tier map) to the user. Confirm that the inventory is complete and the measurements are correct. This prevents optimizing the wrong things.
Score each finding from Phase 1 along three independent dimensions:
| Dimension | Scale | Assessment |
|---|---|---|
| Token Impact 🎯 | High / Med / Low | Tokens per occurrence × occurrences per period |
| Risk ⚠️ | Safe / Moderate / High | Can you undo it? Does it affect core function? |
| Effort 🔧 | Easy / Med / Hard | Single config change? Multi-file edit? Needs research? |
Compute a relative priority for each finding by inverting Risk and Effort:
Priority = ImpactWeight × (1 / RiskWeight) × (1 / EffortWeight)
Where each dimension maps to a simple numeric weight:
Focus on items scoring ≥ 1.5 first. Skip items < 1.0 unless they are trivially easy (effort=1) and safe (risk=1).
These patterns tend to score high across most deployments:
| Pattern | Typical Impact | Typical Risk | Typical Effort |
|---|---|---|---|
| Overly verbose task prompts | High | Safe | Easy |
| Heavy models on simple tasks | High | Safe | Easy |
| No active hours on heartbeat | Med-High | Safe | Easy |
| Duplicated content across bootstrap files | Med-High | Safe | Easy-Med |
| Full tool profile on task-specific agents | High | Moderate | Easy |
| Idle-time session not configured | Med | Safe | Easy |
| Outdated tool/plugin configs still loaded | Low-Med | Safe | Easy |
⚠️ Checkpoint: Show your top-3 priority items to the user. Confirm direction before starting optimization. If the highest-score items seem wrong, revisit Phase 1 measurements.
⚠️ User confirmation gate: Techniques marked Moderate or High risk involve config changes, profile switches, or task merging. Before applying them, present the proposed change using this template and get explicit approval:
## Proposed Change **Technique**: [category/technique name] **Target**: [file/config path] **Before**: [current state, chars/tokens if measurable] **After**: [proposed state, estimated savings] **Risk**: [Moderate/High] **Rollback**: [how to undo]Techniques marked Safe can be applied directly.
Each category below contains a set of techniques. Apply them in priority order from Phase 2 — start with the highest-score items first, regardless of which category they fall into.
If a technique causes a problem:
prompts/backup/ directory).Match your Phase 2 findings to the best starting category:
| Finding | Start With |
|---|---|
| Verbose task prompts (background context, philosophy) | A Prompt Simplicity |
| Heavy models on simple automation tasks | B Model Tiering |
| Bootstrap files >2K chars each, duplicated content | C Context Slimming |
| Full tool profile, rarely-used tools registered | D Tool Profile Optimization |
| Verbose agent output, too many turns per task | E Output Discipline |
| No active hours, co-located tasks running separately | F Session Lifecycle |
| Repeated system prompts without caching structure | G Provider-Side Caching |
| Agent retries failed approaches instead of switching | H Behavioral Discipline |
| Technique | Description | Risk |
|---|---|---|
| A1 Strip preamble | Remove background/rationale paragraphs from task prompts. Keep only: trigger, action, output format. | |
| Before: "你是系统监控助手。每天检查服务器状态:CPU使用率>80%告警、内存>90%告警、磁盘>85%告警、SSL证书<30天告警。每个告警按严重程度分别处理:严重→立即通知值班、一般→发运维邮件、提示→记录日志。" | ||
| After: "系统监控。检查:CPU(>80%) Mem(>90%) Disk(>85%) SSL(<30d)。告警:严重→立即、一般→邮件、提示→日志。" (360→110 chars, -69%) | Safe | |
| A2 Bullet points > prose | Replace multi-sentence descriptions with keyword checklists. | Safe |
| A3 Constrain output | Add "Answer concisely in ≤3 lines" or equivalent to reduce generated tokens. | Safe |
| A4 Remove redundancy | Delete "What NOT to do" sections — proper instructions make negatives implicit. | Safe |
| A5 Reference > inline | Replace full instructions for sub-tasks with file references ("See X.md") when the referenced file is always loaded. | Safe |
| Technique | Description | Risk |
|---|---|---|
| B1 Right-size each task | Map every automated task to the cheapest model that can do it adequately. Test borderline cases. | Safe |
| B2 Define tier boundaries | Document which model(s) belong to each tier so new tasks are assigned correctly. | Safe |
| B3 Batch same-tier runs | Schedule same-tier tasks back-to-back to reuse the same session (single context load). | Moderate |
| Technique | Description | Risk |
|---|---|---|
| C1 Measure every boot file | List all files loaded at session start and identify those > 2K chars for potential trimming. | Safe |
| C2 Cross-reference dedup | When the same content appears in 2+ files (e.g. "Core Principles" in SOUL.md and IDENTITY.md), keep it in one authoritative file and replace the others with a 详见 <file> reference. | Safe |
| C3 Archive aged-out content | Move old diary entries, superseded milestones, and historical promoted entries to a dedicated archive directory. | Safe |
| C4 Trim to one-liner | Convert verbose descriptions to single-line summaries. | |
| Before: "This project's coding conventions were established after three code reviews revealed inconsistent patterns: use 2-space indent for HTML/CSS, 4-space for Python, tabs for Go. Prefix private methods with underscore. No Hungarian notation. Import order: stdlib, third-party, local." | ||
| After: "Coding conventions (see CONTRIBUTING.md) — 6 rules, numbered." | ||
| Actionable instructions stay; background context goes. | Safe |
| Technique | Description | Risk |
|---|---|---|
| D1 Size your tool schema | Count all registered tools and estimate total schema chars. This is typically the single largest per-turn overhead. | Safe (measure only) |
| D2 Switch profile per agent | Use "coding" profile for sub-agents/cron jobs (excludes browser, canvas, media generation, feishu tools). Use "full" only where those tools are actually needed. | Moderate (test on sub-agents first) |
| D3 Disable unused tools | If you have disabled skills or orphaned plugin tools still registering schemas, disable or remove them from the registry. Check skills.entries and plugins.load.paths. | Safe |
| D4 Create custom profile | If neither "full" nor "coding" fits, define a custom profile with exactly the 15-25 tools your use-case needs. Requires config reload. | High |
| Technique | Description | Risk |
|---|---|---|
| E1 No operation narration | Remove "I'll...", "Let me check..." patterns. Do the action directly. | Safe (behavioral) |
| E2 Lead with conclusion | Put the answer first. Add explanation only when needed. | Safe (behavioral) |
| E3 Batch turns | Read → plan → apply all changes in as few turns as possible, instead of read→think→edit→think→verify per-item. Each extra turn adds LCM context overhead. | Safe (behavioral) |
| E4 Sub-agent conciseness | When spawning sub-agents, specify a concise return format. Their full output is injected into context if returned. | Safe |
| Technique | Description | Risk |
|---|---|---|
| F1 Set active hours | Configure heartbeat.activeHours so no work runs during idle time (overnight, weekends). | Safe |
| F2 Isolated sessions | Set heartbeat.isolatedSession: true so periodic checks don't accumulate in the main session. | Safe |
| F3 Light context | Set heartbeat.lightContext: true to skip loading all bootstrap files — only HEARTBEAT.md is injected. | Safe |
| F4 Merge co-located tasks | If two cron jobs run within minutes of each other (e.g. both at 23:xx), merge them into one session with a combined prompt. Copy both prompts into one job's message field separated by a blank line, then remove the later job. Saves one full startup context per day. | Moderate |
| F5 Merge example | Before: Job A at 23:00 (System health check), Job B at 23:10 (Log cleanup). After: Single job at 23:00 with prompt "Do A then B.~A: ...~B: ..." | Moderate |
| F6 Configure queue | If the platform supports message queue settings (debounce, collect), tune them to prevent rapid-turn accumulation during tool execution. | Safe |
Impact is 10× any other category. DeepSeek V4 Pro cached price is 0.83% of uncached. Cache hit rates of 91-96% are achievable with proper prompt structure.
| Technique | Description | Risk |
|---|---|---|
| G1 Fixed prefix first | Design all prompts as [static prefix] + [dynamic suffix]. Static prefix includes system instructions, bootstrap summary, and tool schemas. Dynamic suffix includes runtime instruction. This maximizes KV cache hits on the provider side. | |
| Wrong: "Analyze this code for memory leaks...你是代码审查助手,审查规则如下:..." | ||
| Right: "你是代码审查助手,审查规则如下:...现在分析这段代码的内存泄漏:..." | Safe | |
| G2 Session contiguity | Don't insert unrelated messages between consecutive calls to the same model — this breaks the KV cache prefix. Batch related calls into a single turn instead. | Safe |
| G3 Monitor cache rate | Check provider dashboards for cache hit rate. If <80%, your prefix structure likely has variability. Fix it. | Safe |
| G4 Route to best caching provider | Different providers have wildly different cached prices. DeepSeek V4 Pro: 0.83% of uncached. MiniMax: ~20%. Route routine tasks to the provider with the best cache economics. | Moderate |
These are zero-config, zero-cost techniques. The savings come from how you use the system, not how it's configured.
| Technique | Description | Risk |
|---|---|---|
| H1 Default to working path | Use known-working tools before alternatives. Don't retry tools known to be broken in the current deployment — each retry is a wasted tool call + error response. | |
| Bad: web_search (broken) → error → web_search again → error → baidu-search → works | ||
| Good: baidu-search → works (first attempt) | Safe | |
| H2 Fail once, switch | If a method fails, switch immediately to a known alternative. Don't retry the same approach with slightly different parameters. Each retry costs full tool-call tokens. | Safe |
| H3 Batch > Poll | Gather all data before acting instead of incrementally. One exec or read call that returns 10 results costs less than 5 separate calls returning 2 each. | Safe |
| H4 Fix root cause | If a tool works inconsistently due to a known config issue (API key expired, wrong provider), fix the config. Working around it each time costs more in accumulated failed calls. | Safe |
Before/after comparison of all modified prompts and files. Include total chars and estimated tokens saved.
After editing JSON configuration files, validate:
python3 -c "import json; json.load(open('<config-path>')); print('OK')"
cron action=runs or next
scheduled trigger)Write token-audit-report-YYYY-MM-DD.md summarizing:
Log each optimization cycle in results.tsv (see skill directory for
format reference). This creates an audit trail for the quarterly deep audit (5B).
Optionally create a weekly cron (cheapest available model) that checks prompt lengths haven't crept back:
{
"name": "token-watch-weekly",
"schedule": { "kind": "cron", "expr": "0 10 * * 1", "tz": "Asia/Shanghai" },
"payload": {
"kind": "agentTurn",
"model": "<cheapest-model>",
"message": "Check all cron prompt lengths. Flag any that grew >20% since last baseline.",
"timeoutSeconds": 120
},
"sessionTarget": "isolated",
"delivery": { "mode": "none" }
}
Run the full Phase 1-4 cycle every quarter using the cheapest available model. Compare results against previous reports to spot regrowth trends.
Some configuration paths require a gateway restart to take effect:
agents.defaults.heartbeat.* (edit config file + restart)agents.list[].tools.profilegateway.*, auth.*plugins.* — certain sub-fieldsThese core mechanisms must be preserved even in an aggressive token budget:
openclaw gateway config CLIagents.list[].tools.profile in openclaw.jsontest-prompts.json in skill directoryresults.tsv in skill directoryThis section is populated by the first execution of the Token Saver in a specific deployment. Replace the example values below with real ones.
| Item | Example Path |
|---|---|
| Cron jobs | ~/.openclaw/cron/jobs.json |
| Gateway config | ~/.openclaw/openclaw.json |
| Workspace root | ~/.openclaw/workspace/ |
| Bootstrap files | AGENTS.md, SOUL.md, USER.md, MEMORY.md, HEARTBEAT.md, IDENTITY.md, TOOLS.md, STANDING-ORDERS.md |
| File | Initial Size | After First Pass | Reduction | Techniques Used |
|---|---|---|---|---|
| SOUL.md | 7,034 | 3,521 | -50% | C2 (cross-ref), C4 (one-liner), A2 |
| STANDING-ORDERS.md | 10,960 | 3,816 | -65% | C2 (cross-ref), A4 (remove redundancy) |
| IDENTITY.md | 6,228 | 4,313 | -31% | C2 (dedup with SOUL.md), C4 |
| AGENTS.md | 5,072 | 2,691 | -47% | C2 (ref to STANDING-ORDERS), C4 |
| TOOLS.md | 8,893 | 7,488 | -16% | C4 (remove stale entries) |
| MEMORY.md | 30,224 | 26,420 | -13% | C3 (archive promoted entries) |
| Total | 68,411 | 48,249 | -29% | — |
Per-session token savings from bootstrap compression: ~6,720 tokens.
| File Type | Typical Savings | Best Technique |
|---|---|---|
| Program/Protocol (STANDING-ORDERS.md) | 55-65% | A4 (remove boilerplate sections) |
| Guide/Identity (SOUL.md, IDENTITY.md) | 30-50% | C2 (cross-reference dedup) |
| Instructions (AGENTS.md) | 40-50% | C2 (replace lists with file refs) |
| Knowledge base (MEMORY.md) | 10-20% | C3 (archive old entries only) |
| Config/state table (TOOLS.md) | 10-20% | C4 (remove stale entries only) |
| Task | Model Tier | Model |
|---|---|---|
| Version check | Economy | minimax-m2.7 |
| Demand scanning | Standard | deepseek-v4-pro (needs search) |
| Domain probe | Economy | minimax-m2.7 |
| Dreaming (memory integration) | Economy | minimax-m2.7 |
| Doc maintenance | Economy | minimax-m2.7 |
| WaveCap daily expansion | Standard | deepseek-v4-pro (needs reasoning) |
| Weekly review | Premium | deepseek-v4-pro |
| Friday topic selection | Premium | deepseek-v4-pro |
| Main session | Standard | deepseek-v4-flash |
| Item | Reason | Condition to Revisit |
|---|---|---|
| Tool profile for main agent | High risk (may break unexpected features) | After sub-agent coding profile proven in production for 1 week |
| Cron task merging | Needs user confirmation; may affect reliability | Next token audit cycle |
| Compaction mode change (safeguard→normal) | Needs config reload | When gateway restarted for other reasons |
~/.openclaw/.