Install
openclaw skills install token-efficient-agentAdvanced techniques for minimizing token consumption in OpenClaw operations while maintaining or improving response quality. Includes memory optimization, document processing strategies, tool call efficiency, and contextual awareness methods specifically designed for the OpenClaw architecture.
openclaw skills install token-efficient-agentThis skill provides advanced, battle-tested techniques for minimizing token consumption in OpenClaw operations. Unlike basic tips, these strategies are specifically tailored to OpenClaw's architecture, tool ecosystem, and memory system. By implementing these methods, you can reduce token usage by 60-80% while maintaining or improving response quality and contextual awareness.
OpenClaw's strength lies in its ability to access personal data, memories, and tools. However, each operation consumes tokens:
Without optimization, simple queries can consume thousands of tokens unnecessarily, leading to:
OpenClaw has distinct memory layers with different access costs:
Strategy: Always start with the cheapest available context that might contain the answer.
Each OpenClaw tool has specific optimization parameters. Understanding these allows precision data retrieval rather than brute-force fetching.
Retrieve information in layers: first get metadata/summaries, then only dive deep when necessary based on initial results.
OpenClaw sessions retain loaded data. Structure your workflow to maximize reuse of already-fetched information.
Instead of: Loading entire memory files or searching broadly Use: A multi-stage search approach that minimizes data transfer
Stage 1: Broad Search with Minimal Results
memory_search(query="project deadline decision", maxResults=1, minScore=0.8)
Stage 2: Targeted Snippet Extraction
# If Stage 1 returns a relevant file:
memory_get(path="memory/2026-03-10.md", from=42, lines=8)
Stage 3: Cross-Reference Validation (Only if Needed)
# Only if Stage 2 is ambiguous:
memory_search(query="project deadline", file_path="memory/2026-03-10.md", maxResults=2)
Token Savings: Typically reduces memory loading by 70-90% compared to loading full daily files.
Instead of: Fetching entire documents then searching Use: Offset-limited fetching combined with semantic boundary detection
For Long Documents (>5000 chars):
feishu_fetch_doc(doc_id="doc_xxx", limit=1500)
# If conclusions are likely in last 20%:
feishu_fetch_doc(doc_id="doc_xxx", offset=8000, limit=1500)
For Known Section Documents:
Token Savings: Avoids loading irrelevant document portions, often saving 60-80% of document processing tokens.
Instead of: Making multiple separate tool calls for related data Use: Combine operations where tools support it, or sequence calls to minimize context switching
Memory-Tool Fusion Pattern:
# Instead of:
# 1. memory_search() -> get file path
# 2. feishu_fetch_doc() -> load document
# 3. Process document
# Do:
# 1. memory_search() with doc-specific query to get both memory context AND doc hints
# 2. If memory contains sufficient summary, skip document fetch entirely
# 3. Only fetch document if memory search indicates high-value target
Web Search Optimization:
count=3 instead of default 10 for initial searchesfreshness parameter when temporal relevance is knownTool Savings: Reduces tool call overhead and eliminates redundant data processing by 40-60%.
Instead of: Passing raw data to model for processing Use: Progressive summarization where each stage reduces data size while preserving decision-relevant information
Three-Level Summary Cascade:
feishu_fetch_doc with smart offsets to get key portionsExample Workflow for Document-Based Questions:
# Level 1: Get structurally important parts (headings, conclusions, tables)
section1 = feishu_fetch_doc(doc_id, offset=0, limit=800) # Intro
section2 = feishu_fetch_doc(doc_id, offset=-1000, limit=1000) # Conclusion (approx end)
# Level 2: Ask for thematic summary
summary_prompt = f"Provide a 3-sentence summary of the key points in this text: {section1[:400]}...{section2}"
# Level 3: Task-specific reduction
final_prompt = f"Based on this summary: {summary}, answer ONLY: [specific question]"
Token Savings: Reduces document processing tokens by 75-90% while preserving answer quality.
Instead of: Reactive loading after each user query Use: Predictive loading based on conversation patterns and time/context cues
Implementation:
Prediction Signals:
Example: If user always asks about project status at 10 AM, preload project-related memory snippets at 9:45 AM.
Efficiency Gain: Converts high-cost reactive operations to near-zero-cost proactive operations.
Instead of: Using raw tool outputs Use: Transform tool results to their minimal essential form before model consumption
Patterns:
feishu_doc_comments with is_solved=true/false filters and page_size=1summary and start_time fields when possible, not full descriptionsname and open_id, not full profile dataImplementation: Create wrapper functions that:
Token Savings: Typically 50-80% reduction in tool result processing tokens.
Instead of: Letting session history grow unbounded Use: Active management of conversational context to maintain optimal token budget
Strategies:
OpenClaw-Specific Implementation:
When faced with any request, follow this decision process:
START
│
├──→ Can answer from current session context?
│ │ Yes → Respond directly (0 additional tokens)
│ │ No → Continue
│
├──→ Is answer likely in recent memory (last 3 days)?
│ │ Yes → Use memory_search with tight constraints (maxResults=1, minScore=0.85)
│ │ → If found, use memory_get for exact lines
│ │ → If not found or ambiguous, continue
│ │ No → Continue
│
├──→ Does answer require document/external data?
│ │ No → Use web_search with count=3, freshness if applicable
│ │ Yes → Continue to document processing
│
├──→ Document Processing Decision:
│ │
│ │→ Is document structured with known sections?
│ │ Yes → Fetch only likely relevant sections using offset/limit
│ │ No →
│ │ │→ Is document < 2000 chars?
│ │ │ Yes → Fetch entire document
│ │ │ No →
│ │ │ │→ Fetch first 1500 chars for structure analysis
│ │ │ │→ Based on analysis, fetch only relevant portions
│ │ │ │→ Apply summarization cascade if still large
│
├──→ Apply result minimization: extract only essential fields
│
├──→ If result still large for model input, apply summarization
│
└──→ Formulate response using minimized context
Use heartbeat cycles for:
Feishu Document Fetching:
feishu_doc_comments with filters instead of fetching all commentsWeb Operations:
web_search over web_fetch when possible (returns already-processed snippets)extractMode="text" for non-formatting needsmaxChars limits (1000-2000) unless full content essentialCalendar/Task Queries:
Track your efficiency with these metrics:
Improvement Loop:
Request: "How did our decision on vendor X in January compare to our current leaning toward vendor Y?"
Traditional Approach:
Token-Efficient Approach:
memory_search(query="vendor X decision January", maxResults=1, minScore=0.9, relative_time="last_month")memory_get(path="memory/2026-01-15.md", from=87, lines=12) // Exact decision snippetmemory_search(query="vendor Y evaluation", maxResults=1, minScore=0.85) // Recent notesmemory_get(path="memory/2026-03-16.md", from=34, lines=8) // Current leaningToken Usage: ~150 tokens vs ~2000+ for traditional approach
Request: "Give me a briefing on the upcoming project review meeting."
Traditional Approach:
Token-Efficient Approach:
memory_search(query="project review", maxResults=2, relative_time="this_week")Token Usage: ~300 tokens vs ~3000+ for traditional approach
These techniques are less effective when:
In these cases, be transparent about the trade-offs and get explicit consent before applying optimization.
Token efficiency in OpenClaw isn't about cutting corners—it's about applying the right amount of computational effort to each task. By leveraging OpenClaw's specific architecture, memory system, and tool capabilities, you can dramatically reduce unnecessary token consumption while maintaining high-quality, contextually appropriate responses.
The key insight: Most user queries don't require comprehensive data review—they need precise, relevant information delivered efficiently. These techniques help you deliver exactly that.
Practice these methods consistently, measure their impact, and adapt them to your specific usage patterns. Over time, token-efficient operation will become second nature, allowing you to handle more complex tasks within the same computational budget.