Install
openclaw skills install context-compactorToken-based context compaction for local models (MLX, llama.cpp, Ollama) that don't report context limits.
openclaw skills install context-compactorAutomatic context compaction for OpenClaw when using local models that don't properly report token limits or context overflow errors.
Cloud APIs (Anthropic, OpenAI) report context overflow errors, allowing OpenClaw's built-in compaction to trigger. Local models (MLX, llama.cpp, Ollama) often:
This leaves you with broken conversations when context gets too long.
Context Compactor estimates tokens client-side and proactively summarizes older messages before hitting the model's limit.
┌─────────────────────────────────────────────────────────────┐
│ 1. Message arrives │
│ 2. before_agent_start hook fires │
│ 3. Plugin estimates total context tokens │
│ 4. If over maxTokens: │
│ a. Split into "old" and "recent" messages │
│ b. Summarize old messages (LLM or fallback) │
│ c. Inject summary as compacted context │
│ 5. Agent sees: summary + recent + new message │
└─────────────────────────────────────────────────────────────┘
# One command setup (recommended)
npx jasper-context-compactor setup
# Restart gateway
openclaw gateway restart
The setup command automatically:
~/.openclaw/extensions/context-compactor/openclaw.json with sensible defaultsAdd to openclaw.json:
{
"plugins": {
"entries": {
"context-compactor": {
"enabled": true,
"config": {
"maxTokens": 8000,
"keepRecentTokens": 2000,
"summaryMaxTokens": 1000,
"charsPerToken": 4
}
}
}
}
}
| Option | Default | Description |
|---|---|---|
enabled | true | Enable/disable the plugin |
maxTokens | 8000 | Max context tokens before compaction |
keepRecentTokens | 2000 | Tokens to preserve from recent messages |
summaryMaxTokens | 1000 | Max tokens for the summary |
charsPerToken | 4 | Token estimation ratio |
summaryModel | (session model) | Model to use for summarization |
MLX (8K context models):
{
"maxTokens": 6000,
"keepRecentTokens": 1500,
"charsPerToken": 4
}
Larger context (32K models):
{
"maxTokens": 28000,
"keepRecentTokens": 4000,
"charsPerToken": 4
}
Small context (4K models):
{
"maxTokens": 3000,
"keepRecentTokens": 800,
"charsPerToken": 4
}
/compact-nowForce clear the summary cache and trigger fresh compaction on next message.
/compact-now
/context-statsShow current context token usage and whether compaction would trigger.
/context-stats
Output:
📊 Context Stats
Messages: 47 total
- User: 23
- Assistant: 24
- System: 0
Estimated Tokens: ~6,234
Limit: 8,000
Usage: 77.9%
✅ Within limits
When compaction triggers:
summaryModel)If the LLM runtime isn't available (e.g., during startup), a fallback truncation-based summary is used.
| Feature | Built-in | Context Compactor |
|---|---|---|
| Trigger | Model reports overflow | Token estimate threshold |
| Works with local models | ❌ (need overflow error) | ✅ |
| Persists to transcript | ✅ | ❌ (session-only) |
| Summarization | Pi runtime | Plugin LLM call |
Context Compactor is complementary — it catches cases before they hit the model's hard limit.
Summary quality is poor:
summaryModelsummaryMaxTokensCompaction triggers too often:
maxTokenskeepRecentTokens (keeps less, summarizes earlier)Not compacting when expected:
/context-stats to see current usageenabled: true in config[context-compactor] messagesCharacters per token wrong:
Enable debug logging:
{
"plugins": {
"entries": {
"context-compactor": {
"config": {
"logLevel": "debug"
}
}
}
}
}
Look for:
[context-compactor] Current context: ~XXXX tokens[context-compactor] Compacted X messages → summary