LLM Cost Watchdog

v1.0.0

Monitors real-time LLM API costs, detects runaway loops, enforces budgets, audits code risk, and reports usage across multiple providers and models.

⭐ 0· 105·0 current·0 all-time

byNima Ansari@nimaansari

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for nimaansari/llm-cost-watchdog.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "LLM Cost Watchdog" (nimaansari/llm-cost-watchdog) from ClawHub.
Skill page: https://clawhub.ai/nimaansari/llm-cost-watchdog
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install llm-cost-watchdog

ClawHub CLI

Package manager switcher

npx clawhub@latest install llm-cost-watchdog

Security Scan

Capability signals

CryptoCan make purchasesRequires sensitive credentials

These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

medium confidence

✓

Purpose & Capability

Name/description match what the package implements: pricing loaders, trackers, SDK wrappers, AST audit, visualizer, and budget enforcement. It does not request unrelated credentials or external OS-level access in the registry metadata. The included scripts and references (pricing.md, trackers, audit) are proportionate to the stated feature set.

Instruction Scope

SKILL.md and scripts instruct the agent to: tail OpenClaw session JSONL, wrap provider SDK clients (track_openai, track_anthropic, etc.), and optionally install a global httpx capture (install_global_capture()). These mechanisms write to ~/.cost-watchdog/usage.jsonl and errors.jsonl and may record full prompts, completions, and HTTP payloads. Capturing arbitrary httpx traffic can include non-LLM endpoints or secrets in request bodies/headers; the README/tests also reference regenerating pricing via live network calls. Tests expect a LICENSE file that isn't listed in the manifest — a small inconsistency. Overall the actions are explainable for cost-tracking but broaden the data surface (sensitive content may be logged) and grant broad discretion to the skill when enabled.

✓

Install Mechanism

No external install spec or remote downloads are declared; the bundle contains Python scripts and docs. That reduces supply-chain concerns. The README suggests optional symlinking into global skill directories (user action). No use of URL-based extract/install was observed in metadata.

ℹ

Credentials

No privileged environment variables or external credentials are required by the skill metadata. The skill uses its own env toggles (CW_PRICE_TTL_SECONDS, CW_OFFLINE, CW_STATIC_ONLY, CW_LOG_DIR, CW_BUDGET_USD) which are proportional to caching, offline/test mode, log location, and budget enforcement. It does not ask for unrelated service keys, which is appropriate.

ℹ

Persistence & Privilege

always:false (not force-included). The skill writes logs to ~/.cost-watchdog by default and exposes wrappers that can patch SDK behavior at runtime (inject stream options, tee iterators). Those changes are expected for a monitoring/enforcement layer but increase the blast radius if the global httpx capture or automatic wrapper injection is enabled—review and restrict usage to controlled environments.

What to consider before installing

This skill appears to implement what it claims and does not request unrelated credentials, but it can capture and log full LLM requests/responses and (if enabled) any Python httpx traffic. Before installing: 1) Review the code paths that enable global capture (install_global_capture, openclaw_tailer, and the SDK wrappers) to ensure you understand what is logged and where. 2) Set CW_LOG_DIR to a directory with restricted permissions and consider encrypting or rotating logs if they may contain secrets. 3) Prefer per-SDK wrappers (track_openai, track_anthropic) over the global httpx capture to reduce unintended data capture. 4) Run in an isolated environment (container or dedicated VM) if you will enable live pricing/network access. 5) If you need deterministic tests or to avoid network lookups, set CW_STATIC_ONLY=1 (the test suite already does this). 6) If you aren’t comfortable with the possibility of recording sensitive payloads, do not enable global capture or automatic tailing; otherwise the skill is coherent with its purpose. Additional code review is recommended if you will deploy this in production or on systems with sensitive data.

Like a lobster shell, security has layers — review code before you run it.

latestvk97cqc40k4qv2ys2abz7tc8jc58583jt

105downloads

0stars

1versions

Updated 1w ago

v1.0.0

MIT-0

Cost Watchdog 💰

Real-time cost tracking layer for LLM-based agents. Prices every call live, detects runaway loops in code, enforces budget ceilings mid-execution.

1. Identity

Observes LLM spend without disturbing the agent. Prevents $2,400-overnight-loop disasters by making cost a first-class concern: priced at write time, budgeted at check time, surfaced in reports.

2. Triggers

Activate when:

User mentions cost, budget, tokens, billing.
Code contains LLM API calls (Anthropic, OpenAI, OpenRouter, Google, Groq, ...).
Agent loops or recursive workflows.
Batch / streaming processing with unclear bounds.
/cost-watchdog [command] is invoked.

3. Commands

Run via python3 scripts/cost_watchdog.py <cmd> (or hook into your own CLI).

Command	What it does
`session`	Spend totals from `usage.jsonl` — calls, tokens, cost, top models.
`report`	24h / 7d / 30d windows with top model per window.
`tail [--once]`	Watch OpenClaw session JSONL and log every assistant turn.
`detect [--json]`	Identify which model the agent is currently using (5 probes).
`audit <file.py>`	AST-based code risk scan: unbounded loops, recursion, missing `max_tokens`.
`price <model>`	Live pricing for one model, with source + cache age.
`estimate <model>`	Project cost for `n` iterations of a given call.
`alternatives <model>`	Cheaper same-unit models.
`errors [--limit N]`	Recent swallowed exceptions (silent failures made visible).
`validate-tokens <model>`	Compare our heuristic against provider's authoritative count.
`reset [--all]`	Clear current-day log (`--all` also clears rolled files).

4. Pricing layer

Source chain

openrouter/*            → OpenRouter API (live)           → static fallback
anything else           → LiteLLM JSON (live, cached 24h) → OpenRouter (permissive) → static fallback

2600+ models indexed across chat, completion, embedding, image, audio, video, rerank, OCR, search modes.
30+ providers in the static fallback: Anthropic, OpenAI, Google, Groq, Mistral, Cohere, DeepSeek, Perplexity, xAI, Bedrock, Azure, and more.
Unit-aware: token, image, second, query, page, character, pixel. Alternatives never compare across units.
Circuit breaker opens after 3 consecutive network failures for a host; falls through to cache/static until the cool-down ends (60s).

Tuning

Env var	Default	Effect
`CW_PRICE_TTL_SECONDS`	86400 (24h)	Cache lifetime. `0` = hit network every call.
`CW_OFFLINE`	unset	If `1`, never touch the network.
`CW_STATIC_ONLY`	unset	If `1`, skip live sources entirely. Used by tests.
`CW_LOG_DIR`	`~/.cost-watchdog`	Where usage/errors/cache files live.
`CW_BUDGET_USD`	unset	Ceiling; wrappers raise `BudgetExceeded` when crossed.

Refresh static pricing

python3 scripts/refresh_pricing.py

Regenerates references/pricing.md from the live sources so the offline fallback is fresh. Aborts if fewer than 100 rows came back (protects against clobbering on a network outage).

5. Tracking layer — how we know what was spent

Four independent paths, all write to ~/.cost-watchdog/usage.jsonl:

Path	When to use	Covers streams?
`openclaw_tailer.py --watch`	Running OpenClaw. Zero code changes.	yes (reads completed turns)
`track_openai(client)`	You call OpenAI-compatible SDK (covers OpenRouter, Groq, DeepSeek, Mistral, Together, Fireworks, Cerebras, Anyscale, ...).	yes (tee'd iterator, auto-injects `stream_options={"include_usage": True}`)
`track_anthropic(client)`	Direct Anthropic SDK.	yes (wraps `messages.stream()`)
`track_gemini(model)` / `track_cohere(client)` / `track_bedrock(client)`	Direct provider SDKs.	no (add wrappers if you need streams)
`install_global_capture()` (httpx)	Any modern Python SDK using httpx.	no — streams are flagged into `errors.jsonl` so the gap is visible. Use the SDK wrappers for stream coverage.

Usage log rotates daily: usage.YYYY-MM-DD.jsonl. session_total(since=...) skips files outside the window before scanning.

Aggregation uses canonical_family() so claude-haiku-4-5-20251001, claude-haiku-4-5, and claude-haiku-4.5 are one row in reports.

6. Budget enforcement

Two mechanisms:

Write-time check (race-safe): append_usage(entry, budget_ceiling=X) takes an fcntl.flock on a sidecar, sums the current session, and refuses the write (raises BudgetExceeded) if the call would cross X.
Post-write check: wrappers compare cumulative spend to CW_BUDGET_USD after logging and raise if over. Used when the wrapper doesn't know the ceiling at call time.

Either path stops the agent mid-loop; the LLM call still returns to the caller, but the next one blocks.

7. Code audit (AST)

python3 scripts/cost_watchdog.py audit path/to/agent.py

Walks the AST and reports:

CRITICAL — while True with an LLM call and no max_iterations-style bound.
CRITICAL — function that recurses and calls an LLM API with no depth argument.
HIGH — plain while that calls an API with no retry/iteration counter.
MEDIUM — LLM call missing max_tokens / max_completion_tokens.
MEDIUM — function with ≥5 sequential LLM calls (batching candidate).

Every finding has a file line number. No more count('def ') > 3 and count('self.') > 5 → "recursion detected" false positives.

8. Detection — "what model is the agent using?"

python3 scripts/cost_watchdog.py detect

Five probe layers, ranked by confidence:

Probe	Confidence
OpenClaw session JSONL	high
Claude Code session JSONL	high
Most recent usage-log entry	high
Claude Code `settings.json`	medium
Env vars (`ANTHROPIC_MODEL`, `OPENAI_MODEL`, ...)	medium

Emits a table or --json.

9. Files

Path	Purpose
`scripts/_pricing.py`	Router: picks LiteLLM / OpenRouter / static per query.
`scripts/_sources.py`	Three `PricingSource` classes + disk cache + circuit breaker.
`scripts/tokenizer.py`	Provider-aware token counting (tiktoken for OpenAI; calibrated heuristics for others).
`scripts/model_canon.py`	`canonical_family()` — collapses model variants.
`scripts/code_audit.py`	AST cost-risk walker.
`scripts/usage_log.py`	JSONL writer + rotation + aggregation.
`scripts/tracker.py`	SDK wrappers + streaming + budget enforcement.
`scripts/http_capture.py`	`install_global_capture()` — httpx transport hook.
`scripts/openclaw_tailer.py`	Watches OpenClaw sessions.
`scripts/detect_model.py`	Multi-layer detector.
`scripts/errors.py`	`errors.jsonl` writer + reader.
`scripts/io_utils.py`	`write_json_atomic` / `read_json`.
`scripts/refresh_pricing.py`	Regenerates static `pricing.md` from live sources.
`scripts/cost_watchdog.py`	Unified CLI dispatcher.
`references/pricing.md`	Static fallback (regenerated; ~2600 models).
`tests/test_cost_watchdog.py`	73 tests: router, cache, AST, tokenizer, rotation, cassettes, circuit breaker, canonicalization.

10. Quality checklist

11. Known limits (be honest)

Tokenizer heuristics for Claude/Gemini/etc. are calibrated from docs, not measured. Run cost_watchdog validate-tokens <model> to check drift against the provider's authoritative count when you have an API key.
install_global_capture() can't see streaming responses — httpx exposes an empty body until the user reads the stream. Use track_openai / track_anthropic for stream coverage; http_capture logs skipped streams to errors.jsonl so the gap is visible.
Non-httpx SDKs (older Cohere, boto3 with custom transport) need the per-SDK wrappers — HTTP capture won't see them.
LiteLLM community data can lag 24-48h on brand-new models. OpenRouter's API is truly live for anything it routes.

12. Testing

python3 -m unittest tests.test_cost_watchdog     # 73 tests
python3 scripts/code_audit.py test_risky_code.py # sample risks
python3 scripts/cost_watchdog.py report          # current spend summary

Comments

Loading comments...