Tokenizer

v1.0.2

Full token economy suite for OpenClaw agents. Audits context window usage (skills, history, tool outputs), then applies 5 creative strategies to reduce bloat...

⭐ 1· 86·0 current·0 all-time

by@legiovi

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for legiovi/token-optimizer-skills.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Tokenizer" (legiovi/token-optimizer-skills) from ClawHub.
Skill page: https://clawhub.ai/legiovi/token-optimizer-skills
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install token-optimizer-skills

ClawHub CLI

Package manager switcher

npx clawhub@latest install token-optimizer-skills

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

Name/description (token audit & optimization) align with the provided scripts: count_tokens.py, analyze_skills.py, distill_memory.py, compress_prompt.py and orchestration helpers. The scripts perform token counting, skill-metadata analysis, memory distillation and optional prompt compression — all expected for a token-economy skill.

ℹ

Instruction Scope

SKILL.md explicitly instructs the agent to read loaded skills' metadata, conversation history, and tool outputs. The included scripts implement that: analyze_skills.py scans SKILL.md files in several candidate skill directories and count_tokens.py/distill_memory.py operate on history files provided. This is coherent with the purpose but means the skill will read system/skill files and conversation history (potentially exposing system prompts or other sensitive context) — the README and scripts do include guardrails (e.g., OFFLINE_ONLY for compress_prompt.py).

✓

Install Mechanism

No install specification is provided (instruction-only skill with bundled scripts). Optional Python dependencies are documented (tiktoken, transformers, llmlingua). No remote downloads or extraction steps are present in the manifest; code is local and runs via subprocess, which is appropriate and low-risk for this sort of utility.

✓

Credentials

The skill declares no required environment variables or credentials. It optionally respects a SKILLS_DIR env var for locating skill files; other file-paths are local/defaults. No secrets, tokens, or unrelated cloud credentials are requested.

ℹ

Persistence & Privilege

The orchestrator_config.json and memory_agent.py are designed to write distilled memory into an episodic store (default path .openclaw/memory/episodic) and memory_agent is marked 'autonomous' in the bundled config. The skill is not marked always:true in registry metadata, but the included config and scripts do enable background distillation behavior if an orchestrator wires them up. This is functional for the skill's purpose but is a persistence/automation behavior you should review before enabling in production.

Assessment

This package appears to do what it says: count tokens, analyze skill metadata, distill conversation history, and optionally compress large documents. Before installing, review these points: - The tools read local skill files and conversation history (they look in ~/.openclaw, /app/skills, and other candidate paths). If your system prompt, skills, or chat logs contain sensitive data, be aware this skill will access them during audits. - Distillation writes JSON files to an episodic store (by default .openclaw/memory/episodic). Confirm that location and retention policy are acceptable for your data. - compress_prompt.py requires an external dependency (llmlingua) and is guarded OFFLINE_ONLY in the config; do not run compression against live system prompts or code (the manifest and skill_runner enforce this guardrail, but double-check in your deployment). - No credentials are requested, and there are no remote-download install steps in the manifest — still, verify the source of this skill bundle (owner and homepage are missing) before trusting it with production data. - If you plan to enable automated distillation (memory_agent.autonomous), test in a contained environment first so you understand when and where histories are archived or flushed. If you want higher assurance, ask the publisher for a provenance record or validate the full, untruncated source files locally before enabling the skill in an agent that handles sensitive content.

Like a lobster shell, security has layers — review code before you run it.

latestvk977v7jcws01jtvzvgrw3mbkeh84z7cb

86downloads

1stars

2versions

Updated 1w ago

v1.0.2

MIT-0

Token Economy Skill — Unified Reference

This skill is the complete token management layer for your OpenClaw agent. It covers two phases:

Audit: Find out where tokens are being spent.
Optimize: Apply the right strategy to reclaim them without losing intelligence or memory.

PHASE 1 — TOKEN AUDIT

1.1 When to Run an Audit

Trigger a token audit when:

The conversation feels slow or starts hallucinating earlier context.
You are about to load a large file or document into context.
The user asks to "check token usage", "find context bloat", or "how full is the context?".
The count_tokens.py --estimate flag returns a value above 80% of the model's limit.

1.2 Auditing Workflow

Run this sequence:

Calculate System Prompt & Skills Cost
Identify which skills are loaded. Extract their metadata and SKILL.md sizes.
Calculate Conversation History Cost
Examine how long the conversation has run and estimate recent tool output lengths (grep results, file reads, API responses).

Count precisely using the helper script:

# Zero-dependency estimate (works everywhere)
python count_tokens.py --input my_prompt.txt --estimate

# Precise count for GPT-4o
python count_tokens.py --input my_prompt.txt --model gpt-4o

# Precise count for Gemma/Ollama
python count_tokens.py --input my_prompt.txt --model gemma

# Precise count for Claude (proxy via cl100k_base, ±5% accuracy)
python count_tokens.py --input my_prompt.txt --model claude

# Compare original vs. compressed file
python count_tokens.py --input original.txt --diff compressed.txt --model gpt-4o

Generate the Token Budget Report (see format below).

1.3 Token Budget Report Format

Present findings to the user in this format:

## 📊 Token Budget Report

**Total Estimated Tokens:** `~<NUM>k (of 128k / 200k / 1M limit)`

### 🍰 Breakdown by Layer
| Layer                      | Estimated Tokens | % of Total | Status                            |
|----------------------------|------------------|------------|-----------------------------------|
| 🛠️ System + KIs + Skills   | X,XXX            | XX%        | `Healthy` / `Bloated`             |
| 💬 Conversation History    | X,XXX            | XX%        | `Healthy` / `Too Long`            |
| 📄 Open Documents          | X,XXX            | XX%        | `Normal`                          |
| 🧠 Tool Outputs            | X,XXX            | XX%        | `Warning: Huge outputs detected!` |

### 🚨 Bloat Warnings
- **[Skill Name]:** XXXX tokens (oversized description metadata).
- **[Conversation]:** Multiple large search/RAG dumps in the chat loop.

### 💡 Optimization Recommendations
1. Use Memory Distillation (Strategy 1) to compress conversation history.
2. Trim oversized skill metadata with `analyze_skills.py`.
3. Lazy-load skills — only inject SKILL.md when explicitly invoked.

1.4 Context Anti-Patterns to Watch For

Anti-Pattern	Description	Fix
Verbatim Tool Dumps	Pasting 500-line grep results into context	Return only matching lines
Overloaded Skill Metadata	SKILL.md `description` > 500 chars	Edit to be concise
Repetitive Instructions	Same boilerplate instructions repeated every turn	Consolidate to a single reference block
Full File Reads	Reading a 2,000-line file when only needing one function	Use grep or targeted read
Raw Conversation History	Storing every message un-distilled past 20 turns	Distill to episodic JSON facts

1.5 Model Tokenizer Reference

Model Family	Tokenizer	Vocab Size	Python Tool
GPT-4o, GPT-4o-mini	`o200k_base`	~200k	`tiktoken`
GPT-4, GPT-3.5, Claude (proxy)	`cl100k_base`	~100k	`tiktoken`
Gemma 1/2/3, Ollama/Gemma	SentencePiece `google/gemma-7b`	~256k	`transformers`
Any model (fallback)	`char // 4` estimate	—	Built-in (zero deps)

General Rules:

English prose: ~1.3 tokens per word.
Code: ~2.5–3.5 tokens per word.
Tab indentation is more efficient than 4-space indentation (~3 tokens saved per line).
JSON structure (braces, quotes, commas) is expensive. Prefer CSV or YAML for large repetitive data.

PHASE 2 — TOKEN OPTIMIZER

The 5 Strategies

When tasked with reducing context bloat, apply one or more of these strategies in order of impact vs. risk:

Strategy 1 — Memory Distillation 💾

Impact: 40–70% reduction | Risk: Very Low

Use when the conversation history is long (20+ turns or filling significant context).

What it does: Converts verbose chat history into structured JSON facts, preserving only decisions, preferences, constraints, and actions.

How to run:

# Distill a JSON chat history file
python distill_memory.py --input history.json --output facts.json

# Distill a plain text transcript
python distill_memory.py --input conversation.txt --output facts.json

Output schema (v2.0-openclaw):

{
  "metadata": {
    "distillation_version": "2.0-openclaw",
    "original_chars": 45000,
    "lines_processed": 312,
    "facts_extracted_count": 18
  },
  "facts": [
    {
      "id": "a3f2b1c4",
      "type": "decision",
      "content": "Use React Router for navigation",
      "confidence": "high",
      "source_turn": "inferred"
    },
    {
      "id": "d8e1f290",
      "type": "preference",
      "content": "Prefers dark mode primary color palette",
      "confidence": "medium",
      "source_turn": "inferred"
    },
    {
      "id": "c7a9b034",
      "type": "constraint",
      "content": "Must not use third-party authentication libraries",
      "confidence": "high",
      "source_turn": "inferred"
    },
    {
      "id": "f1e2d5c6",
      "type": "next_action",
      "content": "Implement the dashboard layout component",
      "confidence": "low",
      "source_turn": "inferred"
    }
  ]
}

Fact types:

decision — A confirmed architectural or design choice.
preference — A user or agent preference about style or tools.
constraint — A hard rule that must not be violated.
entity — An important referenced class, file, or system name.
next_action — A pending or future task.

Workflow:

Agent saves current conversation to temp_history.json.
Runs distill_memory.py.
Reads the structured output (50–100 tokens instead of thousands).
If agent-memory-mcp is available, commits facts via memory_write.
Flushes working memory — conversation starts fresh.

Reference: See the full 3-Tiered Memory Architecture in Section 2.6.

Strategy 2 — Skill Lazy Loading 📦

Impact: 10–30% reduction | Risk: Zero

Use when the system prompt metadata is bloated with dozens of passively-loaded skills.

What it does: Identifies skills whose SKILL.md description fields are too verbose (>500 chars) and flags them for trimming.

How to run:

# Auto-detects your skills directory (checks SKILLS_DIR env var, ./skills, ~/.openclaw/skills, ~/.gemini/antigravity/skills)
python analyze_skills.py

# Or set explicitly
SKILLS_DIR=/path/to/your/skills python analyze_skills.py

Sample output:

Analyzing skills in: /app/skills

--- Skill Context Audit ---
Total Skills Analyzed: 239
Estimated Token Cost of Loaded Skill Database: ~20539 tokens

🚨 Bloated Skills Detected (Metadata > 500 chars):
  - planning-with-files: 1049 chars (~262 tokens)
  - ui-ux-pro-max: 841 chars (~210 tokens)

💡 Recommendation: Edit the description in these SKILL.md files to be concise.
💡 Use Token Genome routing: only inject heavy SKILL descriptions when requested.

Fix: Edit the description: field in the flagged SKILL.md frontmatter to be one clean, concise sentence.

Strategy 3 — Code & Context DNA Compression 🧬

Impact: Up to 80% on UI/boilerplate code | Risk: Low (never removes logic)

Use when working with large frontend or backend code files where the agent only needs to understand the logic, not re-read identical boilerplate.

What it does: Instead of reading an entire 500-line component into context, collapses standard import blocks and boilerplate into single-line comment stubs. The agent still understands what is there — it just doesn't re-tokenize redundant text.

Examples:

// BEFORE — 6 lines, ~40 tokens
import { Button } from "@/components/ui/button"
import { Card, CardHeader, CardContent } from "@/components/ui/card"
import { useState, useEffect } from "react"
import { useRouter } from "next/navigation"
import type { FC } from "react"

// AFTER — 1 line, ~8 tokens
// Standard imports: Button, Card, useState, useEffect, useRouter (FC)

Rule: Only compress sections the agent does NOT need to modify. Never compress the logic function you are about to edit.

Strategy 4 — Model Dialect Rewriting 🗣️

Impact: 10–20% | Risk: Low for Gemma/Ollama, not needed for GPT/Claude

Use when the backend model is Gemma or a local Ollama model.

What it does: Rewrites flowing natural language prompts into structured XML blocks, which Gemma's SentencePiece tokenizer handles far more efficiently.

Example:

# BEFORE — flowing prose (~22 tokens)
Here is the context of the user's issue which happens in the API.
Please read it and write a Python script to fix it.

# AFTER — XML dialect (~9 tokens)
<context>Issue in API.</context><task>Write Python fix.</task>

Why it works: XML tags like <context> are often single tokens in Gemma's 256k vocabulary. You save the tokens that were used writing "The following is the context of..." — which are essentially wasted tokens.

Gemma-Specific Tips:

Gemma's large vocabulary means technical compound words often tokenize as 1 token instead of 2–3.
Tabs save ~3 tokens per line over 4-space indentation in long code files.
Focus parameter names in tool schemas over lengthy natural-language descriptions — Gemma infers from names.

Strategy 5 — Prompt Compression (Offline Only) 🗜️

Impact: 20–50% | Risk: HIGH on live prompts — use only for offline large docs

🚨 Critical Guardrail: NEVER apply this to:

Live system prompts

Tool schemas or function definitions

JSON configs or structured data

Code (unless completely isolated and reversible)

Only use for: offline experiments on large pasted documentation blocks, research dumps, or very long user-provided text.

What it does: Uses LLMLingua-2 to mathematically calculate the perplexity (entropy) of each word and removes low-information filler tokens without changing core meaning.

Setup (one-time):

pip install llmlingua

How to run:

# Compress a large research document by 50%
python compress_prompt.py --input large_doc.txt --ratio 0.5

# Preview first 500 chars before committing (dry run safety check)
python compress_prompt.py --input large_doc.txt --ratio 0.5 --dry-run

Always use --dry-run first to visually verify the output is coherent before permanently replacing the source.

2.6 Memory Persistence Architecture

The 3-tiered memory model is the foundation of all quality-preserving token reduction:

┌─────────────────────────────────────┐
│  TIER 1: Working Memory (Expensive) │
│  Active conversation log            │
│  → Keep SHORT. Max 20 turns.        │
└──────────────┬──────────────────────┘
               │ distill_memory.py
               ▼
┌─────────────────────────────────────┐
│  TIER 2: Episodic Store (Cheap)     │
│  JSON facts extracted from history  │
│  → 50-100 tokens per session        │
└──────────────┬──────────────────────┘
               │ memory_write (agent-memory-mcp)
               ▼
┌─────────────────────────────────────┐
│  TIER 3: Semantic Store (Free)      │
│  RAG / Vector / MCP external store  │
│  → Zero context cost until queried  │
└─────────────────────────────────────┘

Memory Distillation Trigger Rules

🚨 The agent strictly ONLY distills memory under these conditions:

At explicit task completion — a defined sub-task is marked done.

After N turns — a dense session has exceeded the configured turn limit.

At 80% context threshold — count_tokens.py --estimate confirms context is near capacity.

Never distill preemptively mid-task — you risk losing the working context needed to finish the current action.

Quick Reference

Goal	Script	Command
Count tokens in a file (precise)	`count_tokens.py`	`python count_tokens.py --input file.txt --model gpt-4o`
Count tokens (no dependencies)	`count_tokens.py`	`python count_tokens.py --input file.txt --estimate`
Find bloated skill descriptions	`analyze_skills.py`	`python analyze_skills.py`
Distill chat to JSON facts	`distill_memory.py`	`python distill_memory.py --input chat.json --output facts.json`
Compress a large document (offline)	`compress_prompt.py`	`python compress_prompt.py --input doc.txt --ratio 0.5 --dry-run`

Phase 2 — OpenClaw Production Hardened. All scripts include graceful dependency fallbacks.

Comments

Loading comments...