Install
openclaw skills install llm-cost-watchdogMonitors real-time LLM API costs, detects runaway loops, enforces budgets, audits code risk, and reports usage across multiple providers and models.
openclaw skills install llm-cost-watchdogReal-time cost tracking layer for LLM-based agents. Prices every call live, detects runaway loops in code, enforces budget ceilings mid-execution.
Observes LLM spend without disturbing the agent. Prevents $2,400-overnight-loop
disasters by making cost a first-class concern: priced at write time,
budgeted at check time, surfaced in reports.
Activate when:
/cost-watchdog [command] is invoked.Run via python3 scripts/cost_watchdog.py <cmd> (or hook into your own CLI).
| Command | What it does |
|---|---|
session | Spend totals from usage.jsonl — calls, tokens, cost, top models. |
report | 24h / 7d / 30d windows with top model per window. |
tail [--once] | Watch OpenClaw session JSONL and log every assistant turn. |
detect [--json] | Identify which model the agent is currently using (5 probes). |
audit <file.py> | AST-based code risk scan: unbounded loops, recursion, missing max_tokens. |
price <model> | Live pricing for one model, with source + cache age. |
estimate <model> | Project cost for n iterations of a given call. |
alternatives <model> | Cheaper same-unit models. |
errors [--limit N] | Recent swallowed exceptions (silent failures made visible). |
validate-tokens <model> | Compare our heuristic against provider's authoritative count. |
reset [--all] | Clear current-day log (--all also clears rolled files). |
openrouter/* → OpenRouter API (live) → static fallback
anything else → LiteLLM JSON (live, cached 24h) → OpenRouter (permissive) → static fallback
token, image, second, query, page, character,
pixel. Alternatives never compare across units.| Env var | Default | Effect |
|---|---|---|
CW_PRICE_TTL_SECONDS | 86400 (24h) | Cache lifetime. 0 = hit network every call. |
CW_OFFLINE | unset | If 1, never touch the network. |
CW_STATIC_ONLY | unset | If 1, skip live sources entirely. Used by tests. |
CW_LOG_DIR | ~/.cost-watchdog | Where usage/errors/cache files live. |
CW_BUDGET_USD | unset | Ceiling; wrappers raise BudgetExceeded when crossed. |
python3 scripts/refresh_pricing.py
Regenerates references/pricing.md from the live sources so the offline
fallback is fresh. Aborts if fewer than 100 rows came back (protects
against clobbering on a network outage).
Four independent paths, all write to ~/.cost-watchdog/usage.jsonl:
| Path | When to use | Covers streams? |
|---|---|---|
openclaw_tailer.py --watch | Running OpenClaw. Zero code changes. | yes (reads completed turns) |
track_openai(client) | You call OpenAI-compatible SDK (covers OpenRouter, Groq, DeepSeek, Mistral, Together, Fireworks, Cerebras, Anyscale, ...). | yes (tee'd iterator, auto-injects stream_options={"include_usage": True}) |
track_anthropic(client) | Direct Anthropic SDK. | yes (wraps messages.stream()) |
track_gemini(model) / track_cohere(client) / track_bedrock(client) | Direct provider SDKs. | no (add wrappers if you need streams) |
install_global_capture() (httpx) | Any modern Python SDK using httpx. | no — streams are flagged into errors.jsonl so the gap is visible. Use the SDK wrappers for stream coverage. |
Usage log rotates daily: usage.YYYY-MM-DD.jsonl. session_total(since=...)
skips files outside the window before scanning.
Aggregation uses canonical_family() so
claude-haiku-4-5-20251001, claude-haiku-4-5, and claude-haiku-4.5
are one row in reports.
Two mechanisms:
append_usage(entry, budget_ceiling=X)
takes an fcntl.flock on a sidecar, sums the current session, and
refuses the write (raises BudgetExceeded) if the call would cross X.CW_BUDGET_USD
after logging and raise if over. Used when the wrapper doesn't know the
ceiling at call time.Either path stops the agent mid-loop; the LLM call still returns to the caller, but the next one blocks.
python3 scripts/cost_watchdog.py audit path/to/agent.py
Walks the AST and reports:
while True with an LLM call and no max_iterations-style bound.while that calls an API with no retry/iteration counter.max_tokens / max_completion_tokens.Every finding has a file line number. No more count('def ') > 3 and count('self.') > 5 → "recursion detected" false positives.
python3 scripts/cost_watchdog.py detect
Five probe layers, ranked by confidence:
| Probe | Confidence |
|---|---|
| OpenClaw session JSONL | high |
| Claude Code session JSONL | high |
| Most recent usage-log entry | high |
Claude Code settings.json | medium |
Env vars (ANTHROPIC_MODEL, OPENAI_MODEL, ...) | medium |
Emits a table or --json.
| Path | Purpose |
|---|---|
scripts/_pricing.py | Router: picks LiteLLM / OpenRouter / static per query. |
scripts/_sources.py | Three PricingSource classes + disk cache + circuit breaker. |
scripts/tokenizer.py | Provider-aware token counting (tiktoken for OpenAI; calibrated heuristics for others). |
scripts/model_canon.py | canonical_family() — collapses model variants. |
scripts/code_audit.py | AST cost-risk walker. |
scripts/usage_log.py | JSONL writer + rotation + aggregation. |
scripts/tracker.py | SDK wrappers + streaming + budget enforcement. |
scripts/http_capture.py | install_global_capture() — httpx transport hook. |
scripts/openclaw_tailer.py | Watches OpenClaw sessions. |
scripts/detect_model.py | Multi-layer detector. |
scripts/errors.py | errors.jsonl writer + reader. |
scripts/io_utils.py | write_json_atomic / read_json. |
scripts/refresh_pricing.py | Regenerates static pricing.md from live sources. |
scripts/cost_watchdog.py | Unified CLI dispatcher. |
references/pricing.md | Static fallback (regenerated; ~2600 models). |
tests/test_cost_watchdog.py | 73 tests: router, cache, AST, tokenizer, rotation, cassettes, circuit breaker, canonicalization. |
+= 0.05 theater).fcntl.flock-guarded budget check-and-log (no race).errors.jsonl via HTTP capture.errors.jsonl surfaces silent failures; cost_watchdog errors shows them.cost_watchdog validate-tokens <model> to check drift
against the provider's authoritative count when you have an API key.install_global_capture() can't see streaming responses — httpx exposes
an empty body until the user reads the stream. Use track_openai /
track_anthropic for stream coverage; http_capture logs skipped streams
to errors.jsonl so the gap is visible.python3 -m unittest tests.test_cost_watchdog # 73 tests
python3 scripts/code_audit.py test_risky_code.py # sample risks
python3 scripts/cost_watchdog.py report # current spend summary