Inception Token Optimizer

v1.0.0

Optimize Inception Labs token usage to minimize costs. Use when choosing Inception models (Mercury, etc.), crafting prompts for Inception, analyzing token co...

0· 154· 1 versions· 0 current· 0 all-time· Updated 11h ago· MIT-0

Install

openclaw skills install inception-token-optimizer

Inception Token Optimizer

Reduce Inception API token consumption through prompt engineering, context management, and budget enforcement.

Free-Tier Limits (Inception Labs)

MetricCap
Requests/min100
Input tokens/min100,000
Output tokens/min10,000

Core Strategies

1. Prompt Compression

  • Remove redundant instructions, filler words, and repeated context.
  • Use short system prompts: "Concise answers. French." beats a 200-word persona block.
  • Avoid re-sending unchanged context — only send deltas.
  • Ask for short replies: "Réponds en < 100 mots."

2. Context Pruning

  • Before sending, estimate tokens: len(text) // 4 (rough heuristic).
  • If total context > target budget, drop oldest messages and replace with a 1-2 sentence summary.
  • Use references/pruning-strategies.md for detailed patterns.

3. Caching

  • Identical prompts → reuse previous response. Do not re-call.
  • Hash the prompt; if seen recently (within session), return cached reply.
  • scripts/lru_cache.py provides a drop-in LRU cache (256 items default).

4. Model Selection

  • Use cheaper/faster models for simple tasks (summarisation, classification).
  • Reserve Mercury (or flagship) for complex reasoning only.
  • Batch trivial queries into a single prompt instead of multiple calls.

5. Output Budgeting

  • Set max_tokens explicitly — never leave it open-ended.
  • Target 150-200 output tokens for conversational replies.
  • Use temperature=0.7 to reduce verbose wandering.

Token Budget Guard

scripts/token_bucket.py enforces per-minute caps using a sliding window:

from scripts.token_bucket import TokenBucket

bucket = TokenBucket(req_per_min=100, in_tok_per_min=100_000, out_tok_per_min=10_000)
bucket.wait_for_slot(in_tokens=500, out_tokens=200)
# proceed with API call

Blocks until a slot is available. Use before every Inception API call.

When to Use This Skill

  • Before sending a prompt to Inception → compress & prune first.
  • When monitoring costs → check token estimates.
  • When near free-tier limits → activate budget guard.
  • When building automation → integrate caching + bucket guard.

Version tags

latestvk97evandns8t542d80nax6dvyn8393r1