3-Layer Token Compressor — Cut AI API Costs 40-60%

v1.1.0

Pre-process prompts through 3 compression layers before sending to paid APIs. Uses a local Ollama model to intelligently compress messages and summarize hist...

0· 583· 2 versions· 3 current· 3 all-time· Updated 6h ago· MIT-0
byShadow Rose@theshadowrose

Install

openclaw skills install token-compressor

3-Layer Token Compressor — Cut AI API Costs 40-60%

Pre-process prompts through 3 compression layers before sending to paid APIs. Uses a free local Ollama model to do the compression work — your paid API only sees the condensed result.

Runtime Requirements

RequirementDetails
OllamaMust be running locally (default: localhost:11434)
Local modelA small model for compression (e.g. llama3.1:8b). Configurable via compressionModel option.
Node.js14+

Ollama is required at runtime. The compressor sends prompts to your local model — not to any external API.

What This Skill Sends to the Local Model

This skill sends the following to your local Ollama model:

OperationSystem promptUser prompt
Message compressionYou are a text compression tool. Output only what is asked, nothing else.Your message + instruction to compress
History summarizationSameOld conversation turns + instruction to summarize

No data is sent to external APIs. All compression happens locally.

Side Effects

TypeDescription
NETWORKHTTP to localhost:11434 only — your local Ollama instance
MEMORYResponse cache stored in-memory (Map, configurable size/TTL)
DISKNone — cache is not persisted to disk

Setup

const TokenCompressor = require('./src/token-compressor');

const compressor = new TokenCompressor({
  ollamaHost: 'localhost',      // default
  ollamaPort: 11434,            // default
  compressionModel: 'llama3.1:8b',  // default — any Ollama model works
  maxUncompressedTurns: 10,     // keep last N turns verbatim
  cacheMaxSize: 100,
  cacheTTL: 3600000             // 1 hour
});

See README.md for full API documentation and usage examples.

Version tags

api-costsvk9752wyys27f16svt0r1ssrh6h82rtpvcompressionvk9752wyys27f16svt0r1ssrh6h82rtpvcost-reductionvk9752wyys27f16svt0r1ssrh6h82rtpvlatestvk9752wyys27f16svt0r1ssrh6h82rtpvollamavk9752wyys27f16svt0r1ssrh6h82rtpvtoken-optimizationvk9752wyys27f16svt0r1ssrh6h82rtpv