Install
openclaw skills install neon-skill-distiller-referenceComplete skill compression documentation — all options, modes, and calibration details (~2,500 tokens).
openclaw skills install neon-skill-distiller-referenceFull reference documentation (~2,500 tokens, ~90% functionality, LLM-estimated). This is the canonical source for all compressed variants.
Note: This reference document intentionally exceeds the 300-line MCE guideline. As the canonical source for all compressed variants, comprehensiveness is prioritized over brevity. For compact variants, see the main SKILL.md (~400 tokens, formula) or compressed (~975 tokens, prose).
Role: Help users compress verbose skills to reduce context window usage Understands: Skills are verbose for human clarity but costly for context; compression is a trade-off Approach: Identify section types, score importance, remove/shorten low-value sections Boundaries: Preserve functionality, report what was removed, never hide trade-offs Tone: Technical, precise, transparent about trade-offs
Data handling: This skill operates within your agent's trust boundary. All analysis uses your agent's configured model. No external APIs beyond your agent's LLM provider.
Activate this skill when the user asks:
| Flag | Default | Description |
|---|---|---|
--mode | threshold | Compression mode: threshold, tokens, oneliner |
--threshold | 0.9 | Functionality preservation target (0.0-1.0) |
--tokens | - | Target token count (for tokens mode) |
--provider | auto | LLM provider: auto, ollama, gemini, openai |
--model | - | Specific model (e.g., llama3.2, gemini-2.0-flash) |
--verbose | false | Show section-by-section analysis |
--dry-run | false | Analyze without outputting compressed skill |
--debug-stages | false | Show intermediate stage outputs |
Provider auto-detection (in order):
ollama availability via ollama listGEMINI_API_KEY environment variableOPENAI_API_KEY environment variableIF ollama available → use ollama (local, fast)
ELIF GEMINI_API_KEY set → use gemini
ELIF OPENAI_API_KEY set → use openai
ELSE → Error: "No LLM provider available. Run 'ollama serve' for local inference, or set GEMINI_API_KEY for cloud."
Parse input skill into sections:
Classify each section using LLM:
| Type | Importance | Compressible? |
|---|---|---|
| TRIGGER | 1.0 (required) | No |
| CORE_INSTRUCTION | 1.0 (required) | No |
| CONSTRAINT | 0.9 | Partially |
| OUTPUT_FORMAT | 0.8 | Partially |
| ADVISORY_PATTERN | 0.7 | Yes, with warning |
| EXAMPLE | 0.5 | Yes (reduce count) |
| EDGE_CASE | 0.4 | Yes (summarize) |
| EXPLANATION | 0.3 | Yes (remove) |
| VERBOSE_DETAIL | 0.2 | Yes (remove first) |
Protected patterns (boost to 0.85+):
name/description (REQUIRED BY SPEC)For compressible sections (EXAMPLE, EDGE_CASE, EXPLANATION, VERBOSE_DETAIL), apply token-level scoring based on self-information theory:
Principle: self_info(token) = -log(P(token|context))
Scoring Prompt:
Analyze this section for compression. Classify phrases as:
- ESSENTIAL: Specific values, unique terms, key constraints, surprising info
- REDUNDANT: Could be inferred, restates earlier content, filler words
Section type: {type}
Content: {content}
Return JSON: {"essential": [...], "redundant": [...], "compression_potential": 0.0-1.0}
Pruning Rules:
Example:
Input: "The system will then proceed to process the input data and return the results"
Output: "process input data → return results"
Removed: "The system will then proceed to", "and"
All modes use MetaGlyph native symbols (see §Symbol Reference below):
→⇒∀¬Threshold Mode (default):
Token-Target Mode:
One-Liner Mode:
For EXAMPLE sections, use extractive + abstractive compression:
Phase 1: Extractive Selection (preferred)
Phase 1.5: One-Liner Compression (for non-selected examples)
Instead of discarding, compress to: {trigger} → {result}
Output structure:
### Full (selected):
[Detailed example with steps...]
### One-liners (compressed):
- `--mode=tokens` → hard token limit
- `--verbose` → section-by-section analysis
Phase 2: Abstractive Fallback (if extractive + one-liners < 0.8)
{generalized_pattern} → {expected_result}Selection Prompt:
Select the 1-2 BEST examples that cover the most patterns with minimum redundancy.
Examples:
{numbered_list}
Return JSON: {
"selected": [indices],
"coverage": 0.0-1.0,
"rationale": "..."
}
Synthesis Prompt (if coverage < 0.8):
Compress these examples into 1-2 summary examples.
Preserve: specific values, key patterns, expected outcomes.
Format: "When X → Do Y → Expect Z"
{examples}
Output shows: Examples: 5 → 2 full + 3 one-liners (extractive) or Examples: 5 → 1 (abstractive)
Evaluate by semantic understanding, NOT metrics.
| Wrong (Metrics) | Right (Semantic) |
|---|---|
| "60% line reduction is too aggressive" | "Can an agent still execute this skill?" |
| "Token count exceeds target" | "Are all triggers and actions preserved?" |
| "Ratio doesn't match threshold" | "Would an agent behave the same way?" |
LLM evaluates preservation by asking:
Score 0-100 reflects semantic capability preservation, not line/token ratios. A skill compressed to 40% of original size can still preserve 95% functionality if the removed content was verbose explanation, redundant examples, or non-essential formatting.
After compression, save entry to .learnings/skill-distiller/calibration.jsonl:
{
"id": "c[N]",
"timestamp": "[ISO 8601]",
"skill": "[skill name from frontmatter]",
"mode": "[threshold|tokens|oneliner]",
"threshold": 0.9,
"input_tokens": 1800,
"output_tokens": 1100,
"reduction_pct": 39,
"sections_total": 14,
"sections_kept": 9,
"sections_removed": 5,
"classification_confidence_mean": 0.90,
"functionality_score": 90,
"protected_patterns_found": ["n-count"],
"protected_patterns_preserved": ["n-count"],
"advisory_patterns_found": [],
"advisory_patterns_removed": [],
"expected": {"functionality": 90},
"actual": null
}
File rotation: If entries > 1000, truncate oldest 100 before appending.
Return compressed skill with metadata.
Functionality preserved: 90% (uncalibrated - first 5 compressions build baseline)
Tokens: 2000 → 1800 (10% reduction)
Classification confidence: 0.87 (mean across sections)
Removed: 3 examples, 2 edge cases
Kept: all triggers, core instructions, constraints
[Compressed skill markdown follows...]
--verbose)Section Analysis:
## When to Use: TRIGGER (1.0, confidence: 0.95) → KEPT
## Process: CORE_INSTRUCTION (1.0, confidence: 0.92) → KEPT
## Examples: EXAMPLE (0.5, confidence: 0.88) → RECOMP
└─ Examples: 5 → 2 full + 3 one-liners (coverage: 0.95)
## Edge Cases: EDGE_CASE (0.4, confidence: 0.85) → SUMMARIZED
## Technical Details: VERBOSE_DETAIL (0.2) → TOKEN-SCORED
└─ Tokens: 150 → 45 (redundant phrases removed)
Token-level analysis:
VERBOSE_DETAIL sections: 3 analyzed, 67% average compression
Phrases removed: "The system will then", "proceed to", "in order to"
RECOMP example analysis:
Original examples: 5
Full (selected): [0, 4] (basic usage, error handling)
One-liners: [1, 2, 3] (threshold, tokens, verbose options)
Coverage: 0.95 (full + one-liners)
Mode: extractive
Protected patterns found: none
Advisory patterns found: parallel-decision → removed (no score penalty)
[Compressed skill markdown follows...]
TRIGGER: [activation conditions]
ACTION: [core behavior]
RESULT: [expected output]
Example with MetaGlyph symbols:
TRIGGER: user asks "compress skill" ∨ "distill" ∨ "reduce context"
ACTION: parse → classify sections → score importance → ¬low-value → output
RESULT: compressed skill ∧ functionality% ∧ token reduction stats
Preserve X% of functionality, compress as much as possible.
/skill-distiller path/to/skill.md --threshold=0.9
Understanding thresholds: The threshold (0.9 = 90%) refers to semantic functionality, not token/line ratios.
| Threshold | Meaning | NOT |
|---|---|---|
| 0.95 | 95% of capabilities preserved | 95% of lines kept |
| 0.90 | 90% of semantic function | 90% of tokens |
| 0.80 | 80% of agent behavior | 80% of bytes |
A 0.9 threshold can result in 50%+ line reduction if the removed content was verbose examples, redundant explanations, or formatting. Judge quality by semantic analysis, not size ratios.
Why 0.9 default: Skill functionality is normally distributed across sections. Wide tails mean some "low importance" sections occasionally carry critical value for edge cases. At 0.9, you preserve more of the tail while still achieving meaningful compression.
Compress to exact token budget.
/skill-distiller path/to/skill.md --tokens=500
Token estimation: Uses 4 chars/token heuristic. Accuracy: +/-20% vs actual provider tokenization. For precise limits, verify with provider's tokenizer.
Extreme compression for quick reference.
/skill-distiller path/to/skill.md --mode=oneliner
Produces 3-line summary: TRIGGER/ACTION/RESULT
These patterns must be preserved even if they look verbose:
| Pattern | Why Protected |
|---|---|
YAML name/description | REQUIRED by Agent Skills spec |
| Task creation | Compaction resilience |
| N-count tracking | Observation workflow |
| Checkpoint/state | State recovery |
| BEFORE/AFTER | Self-calibration |
If a protected pattern is removed, the functionality score is penalized (-10% per pattern) and flagged explicitly in output.
These patterns improve efficiency but aren't required:
| Pattern | Impact if Removed |
|---|---|
| Parallel/serial decision | Suboptimal execution order |
| Performance hints | Slower but functional |
| Caching guidance | Works but inefficient |
Advisory patterns removed are warned but don't penalize the functionality score.
Native mathematical symbols LLMs understand from pre-training. Zero legend overhead.
| Symbol | Replaces | Example |
|---|---|---|
→ | results in, leads to, produces | trigger → action |
⇒ | implies, therefore, thus | condition ⇒ behavior |
∈ | belongs to, is in, member of | value ∈ {a, b, c} |
∀ | for all, for every, for each | ∀ files: validate |
¬ | not, doesn't, isn't | ¬empty → process |
∃ | there exists, there is | ∃ config → load |
∧ | and, also, plus | valid ∧ safe → proceed |
∨ | or, either | error ∨ timeout → retry |
Application: Symbols replace verbose phrases in compressed output. No legend needed in output—LLMs comprehend natively.
Research: Based on MetaGlyph (arXiv:2601.07354) - 62-81% compression with native symbols.
Storage: .learnings/skill-distiller/calibration.jsonl
Each compression run is logged with expected functionality score. User feedback updates the actual field, enabling calibration over time.
| N-count | Output | Meaning |
|---|---|---|
| N < 5 | 90% (uncalibrated - first 5 compressions build baseline) | LLM-only estimate |
| N = 5-10 | 88% (building baseline, N=7) | Gathering data |
| N > 10 | 88% +/- 4% (calibrated, N=12) | Historical data informs CI |
After using a compressed skill, report actual outcome to improve future estimates:
/skill-distiller feedback --id=c1 --actual=85 --outcome="worked"
Outcome values: worked, partial, failed
This updates the calibration entry, enabling the system to learn from real-world usage.
cat .learnings/skill-distiller/calibration.jsonl | jq -s 'length' # Count entries
cat .learnings/skill-distiller/calibration.jsonl | jq -s 'map(select(.actual != null))' # Entries with feedback
This skill can compress itself (meta-recursive).
Guardrails:
Why 0.95 threshold: The distiller must remain fully functional to distill other skills. Capability loss compounds (0.95 x 0.95 = 0.90 at next level).
Enforcement: The 0.95 threshold is a documented guardrail, not an automated check. When compressing skill-distiller variants, manually verify you're using --threshold=0.95 or higher. The formula variant (SKILL.md) should never be the input for self-compression.
| Error | Recovery Hint |
|---|---|
| No content | Provide a valid SKILL.md file path or pipe content via stdin |
| No frontmatter | Add --- block with name and description |
| No trigger section | Add '## When to Use' for best results |
| Token target impossible | Use --mode=oneliner for extreme compression |
| LLM unavailable | Run 'ollama serve' for local, or set GEMINI_API_KEY |
| Already minimal | No compression possible at threshold Y |
Planned features not yet implemented:
| Flag | Description | Status |
|---|---|---|
--with-ci | Calculate confidence interval (3x LLM calls) | Planned |
| Variant | Tokens | Functionality |
|---|---|---|
| skill-distiller (main) | ~400 | ~90% (formula) |
| compressed | ~975 | ~90% (prose) |
| oneliner | ~100 | ~70% |
| reference (this) | ~2,500 | ~90% |
Token counts are estimates using 4 chars/token heuristic. Functionality scores are LLM-estimated.