{"skill":{"slug":"tsaver","displayName":"Token Saver","summary":"Five-phase token audit & optimization framework for OpenClaw: Discover → Prioritize (3D matrix) → Optimize (9 category techniques) → Validate → Monitor. Univ...","description":"---\nname: token-saver\ndescription: >\n  Five-phase token audit & optimization framework for OpenClaw:\n  Discover → Prioritize (3D matrix) → Optimize (9 category techniques)\n  → Validate → Monitor. Universal; adapt via appendix.\n  Trigger: \"省点 token\", \"token 优化\", \"token saver\",\n  \"token audit\", \"检查 token 消耗\"\n  Version history:\n  v1.0 (2026-05-03) — 初始框架, 6 categories\n  v1.5 (2026-05-04) — +G Provider Caching, +H Behavioral Discipline\n  v2.0 (2026-05-12) — +I Context Engineering\n  v2.1 (2026-05-14) — +J Intelligent Model Routing (OpenSquilla)\n                        + Quick Start guide, +Category Decision Tree\n                        + Monitor phase checkpoints\n---\n\n# Token Saver\n\n> Universal token audit & optimization framework for OpenClaw agents.\n> Based on real-world practice (2026-05-04).\n\n## Core Principles\n\n1. **Tier your model usage** — Simple tasks use cheap models; complex reasoning\n   uses expensive ones. Don't mix the two.\n2. **Prompts say *what*, not *why*** — Background rationale and philosophy are\n   noise to an agent. Strip them.\n3. **Batch > Serial** — One call for 10 results costs marginally more than\n   three calls for 3+3+4 results. Combine.\n4. **Context = Cost** — Every file loaded at session start, every tool schema\n   registered, every past message injected — all have a token price.\n5. **Idle = Zero burn** — Nighttime, weekends, and idle periods should run\n   nothing. Configure active hours.\n\n## Output\n\nAfter each full execution, write a report (`token-audit-report-YYYY-MM-DD.md`)\ncontaining: before/after comparison table, estimated weekly savings per change,\nitems deferred and why, recommended next step.\n\n---\n\n## Quick Start\n\nNot every audit needs the full Phase 1-5 treatment. Use these shortcuts\nbased on your goal:\n\n### 🚀 Express Audit (15 min)\nTrigger: \"快速 token 审计\"\n1. Run **Phase 1A** (enumerate cron tasks) + **Phase 1D** (model tier map)\n2. Skip directly to **Phase 3** category table and pick the lowest-hanging fruit\n3. Apply **Safe** techniques only, no user confirmation needed\n4. Skip Phase 4 and Phase 5 — just log the changes\n\n### 🎯 Quick Wins (<5 min)\nTrigger: \"快速省 token\"\nGo straight to these high-impact, zero-risk techniques:\n1. **A3** (constrain output — add conciseness instruction)\n2. **B1** (right-size each task — cheapest viable model)\n3. **G1** (fixed prefix first — static prefix + dynamic suffix)\n4. **H1/H2** (default to working path, fail once and switch)\n\nApply directly without audit preamble.\n\n### 🔄 When to run full audit\n| Indicator | Action |\n|-----------|--------|\n| First time using this skill | **Full Phase 1-5** — establish baseline |\n| Cron tasks changed significantly | **Phase 1-3** — re-discover + re-optimize |\n| New provider/API added | **Phase 1B + 3** — check config + optimize |\n| >30 days since last audit | **Phase 1-2** — measure drift, then re-optimize |\n| Just want a quick check | **Quick Wins** or **Express Audit** |\n\n---\n\n## Phase 1: DISCOVER — Map the Full Token Landscape\n\n### 1A Enumerate All Automated Tasks\n\nRead your cron/scheduled task configuration (e.g. `~/.openclaw/cron/jobs.json`).\n\nFor each task record:\n- `name`\n- `model` (or \"default\" if unset)\n- `message` / `prompt` length in chars\n- `schedule` frequency (daily / weekly / other)\n- `delivery.mode` (announce / none)\n- `sessionTarget` (isolated / main)\n\n### 1B Analyze Agent Configuration\n\nInspect your gateway config (e.g. `openclaw.json`):\n\n- `agents.defaults.heartbeat.*` — interval, active hours, isolated session,\n  light context flag\n- `agents.defaults.compaction.mode` — message retention aggressiveness\n- `agents.list[].tools.profile` — full, coding, or custom\n- `agents.list[].model` — per-agent model override\n\n### 1C Measure Context Load\n\nList every file that is injected at session start (typically files in the\nworkspace root directory). Measure each in chars and estimate token cost\n(~3 chars per token for CJK-heavy text, ~4 for English-heavy).\n\nIf LCM (Lossless Context Management) is active, note the number and average\nsize of compacted summary blocks injected per turn.\n\nIf tool schemas are accessible, estimate total schema chars:\n(count of registered tools × average schema size in chars).\n\n### 1D Map Models to Tiers\n\nCategorize all available models into three tiers based on capability and cost:\n\n- **🏆 Premium** (strong reasoning, high cost): e.g. deepseek-v4-pro, gpt-5.x\n- **🟡 Standard** (balanced): e.g. deepseek-v4-flash, minimax-m2.7\n- **🟢 Economy** (lightweight): e.g. minimax-m2.7-highspeed, ollama local\n\nMap each task from 1A to its current model tier.\n\n> ⚠️ **Checkpoint**: Before moving to Phase 2, present your Phase 1 findings\n> (task inventory, file sizes, model tier map) to the user.\n> Confirm that the inventory is complete and the measurements are correct.\n> This prevents optimizing the wrong things.\n\n---\n\n## Phase 2: PRIORITIZE — Build Your Decision Matrix\n\nScore each finding from Phase 1 along three independent dimensions:\n\n| Dimension | Scale | Assessment |\n|-----------|-------|------------|\n| **Token Impact** 🎯 | High / Med / Low | Tokens per occurrence × occurrences per period |\n| **Risk** ⚠️ | Safe / Moderate / High | Can you undo it? Does it affect core function? |\n| **Effort** 🔧 | Easy / Med / Hard | Single config change? Multi-file edit? Needs research? |\n\n### How to Score\n\nCompute a relative priority for each finding by inverting Risk and Effort:\n\n```\nPriority = ImpactWeight × (1 / RiskWeight) × (1 / EffortWeight)\n```\n\nWhere each dimension maps to a simple numeric weight:\n- Impact: High=3, Med=2, Low=1\n- Risk: Safe=1, Moderate=2, High=3\n- Effort: Easy=1, Med=2, Hard=3\n\nFocus on items scoring ≥ 1.5 first. Skip items < 1.0 unless they are\ntrivially easy (effort=1) and safe (risk=1).\n\n### Common High-Impact Patterns\n\nThese patterns tend to score high across most deployments:\n\n| Pattern | Typical Impact | Typical Risk | Typical Effort |\n|---------|---------------|-------------|----------------|\n| Overly verbose task prompts | High | Safe | Easy |\n| Heavy models on simple tasks | High | Safe | Easy |\n| No active hours on heartbeat | Med-High | Safe | Easy |\n| Duplicated content across bootstrap files | Med-High | Safe | Easy-Med |\n| Full tool profile on task-specific agents | High | Moderate | Easy |\n| Idle-time session not configured | Med | Safe | Easy |\n| Outdated tool/plugin configs still loaded | Low-Med | Safe | Easy |\n\n> ⚠️ **Checkpoint**: Show your top-3 priority items to the user.\n> Confirm direction before starting optimization.\n> If the highest-score items seem wrong, revisit Phase 1 measurements.\n\n---\n\n## Phase 3: OPTIMIZE — Apply Categorical Techniques\n\n> ⚠️ **User confirmation gate**: Techniques marked **Moderate** or **High** risk\n> involve config changes, profile switches, or task merging. Before applying them,\n> present the proposed change using this template and get explicit approval:\n>\n> ```\n> ## Proposed Change\n> **Technique**: [category/technique name]\n> **Target**: [file/config path]\n> **Before**: [current state, chars/tokens if measurable]\n> **After**: [proposed state, estimated savings]\n> **Risk**: [Moderate/High]\n> **Rollback**: [how to undo]\n> ```\n>\n> Techniques marked **Safe** can be applied directly.\n\nEach category below contains a set of techniques. Apply them in priority\norder from Phase 2 — start with the highest-score items first, regardless\nof which category they fall into.\n\n### Failure Recovery\n\nIf a technique causes a problem:\n- **Config change**: Restore the backed-up config file and reload.\n- **Cron merge broken**: Restore the old separate cron job from version control\n  or re-create it from the original prompt.\n- **Profile switch issue**: Revert to \"full\" profile, report the missing tool.\n- **Prompt compression over-aggressive**: Restore from the diff backup (keep\n  pre-optimization prompt versions in a `prompts/backup/` directory).\n\n### Category Selection Guide\n\nMatch your Phase 2 findings to the best starting category:\n\n| Finding | Start With |\n|---------|-----------|\n| Verbose task prompts (background context, philosophy) | **A** Prompt Simplicity |\n| Heavy models on simple automation tasks | **B** Model Tiering |\n| Bootstrap files >2K chars each, duplicated content | **C** Context Slimming |\n| Full tool profile, rarely-used tools registered | **D** Tool Profile Optimization |\n| Verbose agent output, too many turns per task | **E** Output Discipline |\n| No active hours, co-located tasks running separately | **F** Session Lifecycle |\n| Repeated system prompts without caching structure | **G** Provider-Side Caching |\n| Agent retries failed approaches instead of switching | **H** Behavioral Discipline |\n| Simple/complex tasks both use premium model | **J** Intelligent Model Routing |\n\n### Category Decision Tree\n\nIf you're not sure which category to start with, follow this tree from top\nto bottom — the first match tells you your likely best starting category:\n\n```\n1. Is the main session slow or expensive?\n   → Check B (tiering) and J (routing)\n   → Also check D (too many tools loaded?)\n\n2. Are cron jobs consuming more than expected?\n   → Check A (prompts too wordy?), then B (wrong model?)\n   → If F (same-tier jobs not batched?)\n\n3. Is context getting cut off mid-task?\n   → Check C (bootstrap too large?) → I (progressive disclosure?)\n   → Then J3 (incremental delivery?)\n\n4. Are agent outputs too verbose?\n   → Check E (output discipline) → H (behavioral discipline)\n\n5. Is the same heavy prompt repeated across tasks?\n   → Check G (provider-side caching: fixed prefix first?)\n\n6. Are you seeing the same errors repeatedly?\n   → Check H2 (fail once, switch) → H4 (fix root cause)\n\n7. Default (no obvious symptom):\n   Run Phase 1 from scratch → Phase 2 will tell you where to go\n```\n\n> **Pro tip**: Start with G (Provider-Side Caching) if you use DeepSeek.\n> Cache pricing is 0.83% of uncached — fixing prefix structure alone\n> can cut token costs by 90%+.\n\n### A. Prompt Simplicity\n\n| Technique | Description | Risk |\n|-----------|-------------|------|\n| **A1** Strip preamble | Remove background/rationale paragraphs from task prompts. Keep only: trigger, action, output format.\n  *Before:* \"你是系统监控助手。每天检查服务器状态：CPU使用率>80%告警、内存>90%告警、磁盘>85%告警、SSL证书<30天告警。每个告警按严重程度分别处理：严重→立即通知值班、一般→发运维邮件、提示→记录日志。\"\n  *After:* \"系统监控。检查：CPU(>80%) Mem(>90%) Disk(>85%) SSL(<30d)。告警：严重→立即、一般→邮件、提示→日志。\" (360→110 chars, -69%) | Safe |\n| **A2** Bullet points > prose | Replace multi-sentence descriptions with keyword checklists. | Safe |\n| **A3** Constrain output | Add \"Answer concisely in ≤3 lines\" or equivalent to reduce generated tokens. | Safe |\n| **A4** Remove redundancy | Delete \"What NOT to do\" sections — proper instructions make negatives implicit. | Safe |\n| **A5** Reference > inline | Replace full instructions for sub-tasks with file references (\"See X.md\") when the referenced file is always loaded. | Safe |\n\n### B. Model Tiering\n\n| Technique | Description | Risk |\n|-----------|-------------|------|\n| **B1** Right-size each task | Map every automated task to the cheapest model that can do it adequately. Test borderline cases. | Safe |\n| **B2** Define tier boundaries | Document which model(s) belong to each tier so new tasks are assigned correctly. | Safe |\n| **B3** Batch same-tier runs | Schedule same-tier tasks back-to-back to reuse the same session (single context load). | Moderate |\n\n### C. Context Slimming\n\n| Technique | Description | Risk |\n|-----------|-------------|------|\n| **C1** Measure every boot file | List all files loaded at session start and identify those > 2K chars for potential trimming. | Safe |\n| **C2** Cross-reference dedup | When the same content appears in 2+ files (e.g. \"Core Principles\" in SOUL.md and IDENTITY.md), keep it in one authoritative file and replace the others with a `详见 <file>` reference. | Safe |\n| **C3** Archive aged-out content | Move old diary entries, superseded milestones, and historical promoted entries to a dedicated archive directory. | Safe |\n| **C4** Trim to one-liner | Convert verbose descriptions to single-line summaries.\n  *Before:* \"This project's coding conventions were established after three code reviews revealed inconsistent patterns: use 2-space indent for HTML/CSS, 4-space for Python, tabs for Go. Prefix private methods with underscore. No Hungarian notation. Import order: stdlib, third-party, local.\"\n  *After:* \"Coding conventions (see CONTRIBUTING.md) — 6 rules, numbered.\"\n  Actionable instructions stay; background context goes. | Safe |\n\n### D. Tool Profile Optimization\n\n| Technique | Description | Risk |\n|-----------|-------------|------|\n| **D1** Size your tool schema | Count all registered tools and estimate total schema chars. This is typically the single largest per-turn overhead. | Safe (measure only) |\n| **D2** Switch profile per agent | Use \"coding\" profile for sub-agents/cron jobs (excludes browser, canvas, media generation, feishu tools). Use \"full\" only where those tools are actually needed. | Moderate (test on sub-agents first) |\n| **D3** Disable unused tools | If you have disabled skills or orphaned plugin tools still registering schemas, disable or remove them from the registry. Check `skills.entries` and `plugins.load.paths`. | Safe |\n| **D4** Create custom profile | If neither \"full\" nor \"coding\" fits, define a custom profile with exactly the 15-25 tools your use-case needs. Requires config reload. | High |\n\n### E. Output Discipline\n\n| Technique | Description | Risk |\n|-----------|-------------|------|\n| **E1** No operation narration | Remove \"I'll...\", \"Let me check...\" patterns. Do the action directly. | Safe (behavioral) |\n| **E2** Lead with conclusion | Put the answer first. Add explanation only when needed. | Safe (behavioral) |\n| **E3** Batch turns | Read → plan → apply all changes in as few turns as possible, instead of read→think→edit→think→verify per-item. Each extra turn adds LCM context overhead. | Safe (behavioral) |\n| **E4** Sub-agent conciseness | When spawning sub-agents, specify a concise return format. Their full output is injected into context if returned. | Safe |\n\n### F. Session Lifecycle\n\n| Technique | Description | Risk |\n|-----------|-------------|------|\n| **F1** Set active hours | Configure `heartbeat.activeHours` so no work runs during idle time (overnight, weekends). | Safe |\n| **F2** Isolated sessions | Set `heartbeat.isolatedSession: true` so periodic checks don't accumulate in the main session. | Safe |\n| **F3** Light context | Set `heartbeat.lightContext: true` to skip loading all bootstrap files — only HEARTBEAT.md is injected. | Safe |\n| **F4** Merge co-located tasks | If two cron jobs run within minutes of each other (e.g. both at 23:xx), merge them into one session with a combined prompt. Copy both prompts into one job's `message` field separated by a blank line, then remove the later job. Saves one full startup context per day. | Moderate |\n| **F5** Merge example | Before: Job A at 23:00 (System health check), Job B at 23:10 (Log cleanup). After: Single job at 23:00 with prompt \"Do A then B.~A: ...~B: ...\" | Moderate |\n| **F6** Configure queue | If the platform supports message queue settings (debounce, collect), tune them to prevent rapid-turn accumulation during tool execution. | Safe |\n\n### G. Provider-Side Caching\n\n> **Impact is 10× any other category.** DeepSeek V4 Pro cached price is 0.83% of\n> uncached. Cache hit rates of 91-96% are achievable with proper prompt structure.\n\n| Technique | Description | Risk |\n|-----------|-------------|------|\n| **G1** Fixed prefix first | Design all prompts as `[static prefix] + [dynamic suffix]`. Static prefix includes system instructions, bootstrap summary, and tool schemas. Dynamic suffix includes runtime instruction. This maximizes KV cache hits on the provider side.\n  *Wrong:* \"Analyze this code for memory leaks...你是代码审查助手，审查规则如下：...\"\n  *Right:* \"你是代码审查助手，审查规则如下：...现在分析这段代码的内存泄漏：...\" | Safe |\n| **G2** Session contiguity | Don't insert unrelated messages between consecutive calls to the same model — this breaks the KV cache prefix. Batch related calls into a single turn instead. | Safe |\n| **G3** Monitor cache rate | Check provider dashboards for cache hit rate. If <80%, your prefix structure likely has variability. Fix it. | Safe |\n| **G4** Route to best caching provider | Different providers have wildly different cached prices. DeepSeek V4 Pro: 0.83% of uncached. MiniMax: ~20%. Route routine tasks to the provider with the best cache economics. | Moderate |\n\n### H. Behavioral Discipline\n\n> These are zero-config, zero-cost techniques. The savings come from how you use\n> the system, not how it's configured.\n\n| Technique | Description | Risk |\n|-----------|-------------|------|\n| **H1** Default to working path | Use known-working tools before alternatives. Don't retry tools known to be broken in the current deployment — each retry is a wasted tool call + error response.\n  *Bad:* web_search (broken) → error → web_search again → error → baidu-search → works\n  *Good:* baidu-search → works (first attempt) | Safe |\n| **H2** Fail once, switch | If a method fails, switch immediately to a known alternative. Don't retry the same approach with slightly different parameters. Each retry costs full tool-call tokens. | Safe |\n| **H3** Batch > Poll | Gather all data before acting instead of incrementally. One `exec` or `read` call that returns 10 results costs less than 5 separate calls returning 2 each. | Safe |\n| **H4** Fix root cause | If a tool works inconsistently due to a known config issue (API key expired, wrong provider), fix the config. Working around it each time costs more in accumulated failed calls. | Safe |\n\n### I. Context Engineering (2026-05-12 新增)\n\n> Context Engineering 是 2026 年从 Prompt Engineering 演进出的上层方法论。\n> 核心原则：渐进式披露 (Progressive Disclosure) — 仅在任务需要时加载特定模块。\n> 北京大学论文《Meta Context Engineering via Agentic Skill Evolution》实测：\n> Token 消耗降低 60%，任务成功率提升 45%。\n\n| Technique | Description | Risk |\n|-----------|-------------|------|\n| **I1** Progressive disclosure | 按任务类型分级加载系统能力描述。不需要的技能说明不加载。与 token-saver 的 C1-C4 互补：C 系列负责文件级裁剪，I1 负责能力级裁剪。\n  Wrong: Agent 启动时加载全部技能描述\n  Right: Agent 启动时按任务类型加载对应技能（搜索类任务只加载搜索相关技能描述） | Safe |\n| **I2** Semantic context scoring | 用语义重要性评分代替固定滑动窗口。计算历史交互与当前 query 的语义相似度，只保留 top-3 + 最近 1 轮。\n  推荐实现：Sentence-BERT 计算余弦相似度，按得分排序裁剪到 token budget。\n  参考开源项目：Agent Skills for Context Engineering | Safe |\n| **I3** Fixed prefix pattern | 设计所有 prompt 为「static prefix + dynamic suffix」。静态前缀包括系统指令、bootstrap 摘要、工具 schema 描述。动态后缀包括本次运行指令。最大化 provider 端 KV cache 命中。\n  Wrong: 每次运行时先写任务指令再补系统描述\n  Right: 系统描述在前，任务指令在后 | Safe |\n| **I4** Tool call circuit breaking | 在调用高成本外部工具（SQL 查询、图像生成、文件写入）前增加预检代理。校验参数合法性 + 资源预估。超限或非法直接拒绝，不触发实际调用。\n  适用场景：web_fetch 超长 URL、exec 高风险命令、文件写入大文件。 | Safe |\n| **I5** Cost-aware context pruning | 按 taskType 和 budget tokens 动态决定上下文精度。简单任务用精简 prompt，复杂任务才加载完整能力描述。\n  实现：在 cron payload 中嵌入 task_type 字段，根据该字段调度上下文模板。 | Safe |\n\n### J. Intelligent Model Routing (2026-05-14 新增)\n\n> 基于 OpenSquilla（开源 Token 优化引擎）方案。\n> Core insight: 路由决策本身不消耗 Token — 在本地判定任务复杂度后决定走哪个模型。\n> 与 B 系列（Model Tiering）的区别：B 说\"每个任务应该固定分配一个 tier\"，\n> J 说\"同一个任务的每一次调用动态判断走哪个 tier\"。\n\n| Technique | Description | Risk |\n|-----------|-------------|------|\n| **J1** Dynamic inbound grading | 对每次入站请求做轻量复杂度判定（prompt length + expected reasoning depth + tool count），自动路由到对应 tier 的模型。\n  判定规则示例：\n  - 🟢 Economy: 简单信息查询（token 消耗 < 1K，无推理要求）→ sensenova-6.7-flash-lite / minimax-m2.7\n  - 🟡 Standard: 一般分析任务（1K-5K，轻度推理）→ deepseek-v4-flash\n  - 🏆 Premium: 深度推理（>5K，复杂推理/多工具）→ deepseek-v4-pro\n  实现：在 cron/子代理 payload 中嵌入 `model` 字段，或使用中间路由代理做分类。\n  *Before:* 所有每日 cron 统一用 deepseek-v4-pro，包括简单版本检查\n  *After:* 版本检查→sensenova-6.7-flash-lite，领域探针→deepseek-v4-flash，\n           深度分析→deepseek-v4-pro。按任务特征分级，非一刀切。 | Safe |\n| **J2** Local routing decision | 路由判定在本地完成（不需要发 request 到远程判断路由）。\n  方案：在任务 payload 中预置 `routing_hint` 字段（economy/standard/premium），\n  由编排层基于任务特征（prompt length、任务类型、预期工具调用数）决定。\n  *Bad:* 每次调用先发一个轻量请求问\"这个走哪个模型？\"（浪费 Token）\n  *Good:* 任务描述本身包含路由线索，编排层本地匹配 | Safe |\n| **J3** Incremental context delivery | OpenSquilla 核心方案之一：先发送最小必要上下文，不足时增量补充。\n  与 C 系列（Context Slimming）区别：C 是文件级裁剪（减少启动加载量），\n  J3 是调用级递进（先发主干，模型需要时再发细节）。\n  实现：prompt 中先给出高度压缩的摘要版本，后面用\"如需详情请见\"标记展开点。\n  结合 OpenClaw 的 lossless-claw 机制：LCM 摘要本身已经是一个渐进式实现。 | Safe |\n| **J4** Cache-aware routing | 统计各 provider 的 KV cache 命中率，优先路由到命中率高的 provider。\n  当前已知：DeepSeek V4 Pro cache price = 0.83% of uncached。\n  如果命中了缓存的 prompt prefix（含 bootstrap + 技能描述），成本降低 99%+。\n  实现：在路由时优先选择上次使用同一模型+同一 prefix 结构的 provider。 | Moderate |\n| **J5** Fallback chain | 当分配的路由模型不可用（API 超时/限流/报错），自动降级到备用模型。\n  与 H2（Fail once, switch）的区别：H2 是行为纪律（失败了就换方法），\n  J5 是配置级的自动 failover 链。\n  示例 Fallback Chain：\n  1️⃣ deepseek-v4-pro → 2️⃣ deepseek-v4-flash → 3️⃣ sensenova-6.7-flash-lite\n  实现：在 cron/子代理 payload 中使用 `fallbacks` 字段配置降级顺序。\n  *Before:* cron job 用 minimax-m2.7，卡死后无降级 → 任务永远失败\n  *After:* 同一 cron 配置 `fallbacks: [\"deepseek-v4-flash\", \"deepseek-v4-pro\"]` →\n           minimax 卡死后自动切到 deepseek-v4-flash 执行 | Safe |\n\n> **权衡**：智能路由的收益上限取决于任务复杂度分布。如果 80% 的任务是简单查询，\n> 智能路由的 Token 节省可达 60-80%。如果大部分任务已经是标准/经济 tier，\n> 额外收益有限。建议做一次任务复杂度分布扫描后再选择启用哪些 J 技术。\n\n---\n\n## Phase 4: VALIDATE — Confirm Results\n\n### 4A Prompt Length Delta\n\nBefore/after comparison of all modified prompts and files. Include total\nchars and estimated tokens saved.\n\n### 4B Config Integrity\n\nAfter editing JSON configuration files, validate:\n\n```bash\npython3 -c \"import json; json.load(open('<config-path>')); print('OK')\"\n```\n\n### 4C Functional Test\n\n- Verify cron tasks still start correctly (check `cron action=runs` or next\n  scheduled trigger)\n- Verify heartbeat runs in configured active window\n- Read through compressed cron prompts to ensure key instructions survive\n\n### 4D Generate Report\n\nWrite `token-audit-report-YYYY-MM-DD.md` summarizing:\n- Changes made and per-change token savings\n- Total estimated weekly token reduction\n- Items deferred and why\n- Recommended next optimization\n\nLog each optimization cycle in `results.tsv` (see skill directory for\nformat reference). This creates an audit trail for the quarterly deep audit (5B).\n\n---\n\n## Phase 5: MONITOR — Guard Against Regrowth\n\n### 5A Periodic Token Watch (Optional)\n\nOptionally create a weekly cron (cheapest available model) that checks\nprompt lengths haven't crept back:\n\n```json\n{\n  \"name\": \"token-watch-weekly\",\n  \"schedule\": { \"kind\": \"cron\", \"expr\": \"0 10 * * 1\", \"tz\": \"Asia/Shanghai\" },\n  \"payload\": {\n    \"kind\": \"agentTurn\",\n    \"model\": \"<cheapest-model>\",\n    \"message\": \"Check all cron prompt lengths. Flag any that grew >20% since last baseline.\",\n    \"timeoutSeconds\": 120\n  },\n  \"sessionTarget\": \"isolated\",\n  \"delivery\": { \"mode\": \"none\" }\n}\n```\n\n### 5B Quarterly Deep Audit\n\nRun the full Phase 1-4 cycle every quarter using the cheapest available\nmodel. Compare results against previous reports to spot regrowth trends.\n\nThe quarterly audit MUST include:\n- Compare each cron's prompt length against last audit baseline\n- Check if unused categories crept back (complacency regrowth)\n- Verify all J routing hints still reflect actual task complexity\n- Re-run the Priority Matrix from Phase 2 to catch new high-impact items\n\n### 5C Real-Time Drift Alerts (Optional)\n\nWhen token consumption suddenly spikes, it's usually one of three causes.\nKnow which one:\n\n| Symptom | Likely Cause | Fix |\n|---------|-------------|-----|\n| A specific cron doubled in token count | Prompt crept back (someone added preamble) | A1 Strip preamble |\n| All crons increased proportionally | Model changed (e.g. dev switched back to pro from flash) | B1 Right-size, check default model |\n| Session-level spikes, not cron | Tool profile expanded / new plugin registered | D1 Size tool schemas, check profile |\n| Intermittent spikes | Prefix changed → KV cache missed | G1 Fixed prefix, check for variation |\n\nIf you have access to provider dashboards:\n- **Monitor cache hit rate** — DeepSeek dashboard shows prefix cache stats\n- **If hit rate drops below 80%** -> inspect recent prompt changes for prefix variation\n- **Track per-model token consumption weekly** — a 2× week-over-week jump is a red flag\n\n---\n\n## Safety Boundaries\n\n### Configs That Need Gateway Restart\n\nSome configuration paths require a gateway restart to take effect:\n- `agents.defaults.heartbeat.*` (edit config file + restart)\n- `agents.list[].tools.profile`\n- `gateway.*`, `auth.*`\n- `plugins.*` — certain sub-fields\n\n### What NOT to Compress\n\nThese core mechanisms must be preserved even in an aggressive token budget:\n- Error detection logic (consecutive errors, failure alerts)\n- Essential signal handling (high-priority alerts → auto-escalation)\n- Drift detection for recurring tasks\n\n### External References\n\n- OpenClaw Cron Jobs: https://docs.openclaw.ai/automation/cron.md\n- OpenClaw Standing Orders: https://docs.openclaw.ai/automation/standing-orders.md\n- OpenClaw Gateway Config: `openclaw gateway config` CLI\n- OpenClaw Agent Profiles: `agents.list[].tools.profile` in openclaw.json\n- Test Prompts: `test-prompts.json` in skill directory\n- Results Log: `results.tsv` in skill directory\n\n---\n\n## Appendix: Local Deployment Configuration\n\nThis section is populated by the first execution of the Token Saver in a\nspecific deployment. Replace the example values below with real ones.\n\n### Configuration Paths\n\n| Item | Example Path |\n|------|-------------|\n| Cron jobs | `~/.openclaw/cron/jobs.json` |\n| Gateway config | `~/.openclaw/openclaw.json` |\n| Workspace root | `~/.openclaw/workspace/` |\n| Bootstrap files | AGENTS.md, SOUL.md, USER.md, MEMORY.md, HEARTBEAT.md, IDENTITY.md, TOOLS.md, STANDING-ORDERS.md |\n\n### Baseline Measurements (example: Wave 2026-05-04)\n\n| File | Initial Size | After First Pass | Reduction | Techniques Used |\n|------|-------------|------------------|-----------|----------------|\n| SOUL.md | 7,034 | 3,521 | -50% | C2 (cross-ref), C4 (one-liner), A2 |\n| STANDING-ORDERS.md | 10,960 | 3,816 | -65% | C2 (cross-ref), A4 (remove redundancy) |\n| IDENTITY.md | 6,228 | 4,313 | -31% | C2 (dedup with SOUL.md), C4 |\n| AGENTS.md | 5,072 | 2,691 | -47% | C2 (ref to STANDING-ORDERS), C4 |\n| TOOLS.md | 8,893 | 7,488 | -16% | C4 (remove stale entries) |\n| MEMORY.md | 30,224 | 26,420 | -13% | C3 (archive promoted entries) |\n| **Total** | **68,411** | **48,249** | **-29%** | — |\n\nPer-session token savings from bootstrap compression: ~6,720 tokens.\n\n### Benchmark: Compression by File Type\n\n| File Type | Typical Savings | Best Technique |\n|-----------|----------------|----------------|\n| Program/Protocol (STANDING-ORDERS.md) | 55-65% | A4 (remove boilerplate sections) |\n| Guide/Identity (SOUL.md, IDENTITY.md) | 30-50% | C2 (cross-reference dedup) |\n| Instructions (AGENTS.md) | 40-50% | C2 (replace lists with file refs) |\n| Knowledge base (MEMORY.md) | 10-20% | C3 (archive old entries only) |\n| Config/state table (TOOLS.md) | 10-20% | C4 (remove stale entries only) |\n\n### Task-to-Model Map\n\n| Task | Model Tier | Model |\n|------|-----------|-------|\n| Version check | Economy | minimax-m2.7 |\n| Demand scanning | Standard | deepseek-v4-pro (needs search) |\n| Domain probe | Economy | minimax-m2.7 |\n| Dreaming (memory integration) | Economy | minimax-m2.7 |\n| Doc maintenance | Economy | minimax-m2.7 |\n| WaveCap daily expansion | Standard | deepseek-v4-pro (needs reasoning) |\n| Weekly review | Premium | deepseek-v4-pro |\n| Friday topic selection | Premium | deepseek-v4-pro |\n| Main session | Standard | deepseek-v4-flash |\n\n### Deferred Items\n\n| Item | Reason | Condition to Revisit |\n|------|--------|---------------------|\n| Tool profile for main agent | High risk (may break unexpected features) | After sub-agent coding profile proven in production for 1 week |\n| Cron task merging | Needs user confirmation; may affect reliability | Next token audit cycle |\n| Compaction mode change (safeguard→normal) | Needs config reload | When gateway restarted for other reasons |\n\n### Deployment-Specific Constraints\n\n- **Network**: GFW blocks chatgpt.com, api.openai.com. All OpenAI/Codex models unavailable.\n- **Models available**: deepseek-v4-pro (premium), deepseek-v4-flash (standard),\n  minimax-m2.7 (economy).\n- **File paths**: Standard OpenClaw paths under `~/.openclaw/`.\n- **Git**: Workspace is a git repository; all changes version-controlled.\n","tags":{"latest":"2.1.0","openclaw":"2.1.0","token-optimization":"2.1.0"},"stats":{"comments":0,"downloads":425,"installsAllTime":0,"installsCurrent":0,"stars":0,"versions":2},"createdAt":1777873419076,"updatedAt":1778767909279},"latestVersion":{"version":"2.1.0","createdAt":1778767833982,"changelog":"v2.1: Category J Intelligent Model Routing (OpenSquilla), Quick Start 3-tier guide, Category Decision Tree, real-time drift monitoring 5C, test-prompts.json for darwin validation","license":"MIT-0"},"metadata":null,"owner":{"handle":"youxiyin","userId":"s175qrrfe39hw9vjcjresk5fqx862j71","displayName":"youxiyin","image":"https://avatars.githubusercontent.com/u/268643193?v=4"},"moderation":{"isSuspicious":false,"isMalwareBlocked":false,"verdict":"clean","reasonCodes":["review.llm_review"],"summary":"Review: review.llm_review","engineVersion":"v2.4.24","updatedAt":1780090805258}}