Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

LLM Cost Watchdog

v1.0.0

Monitors real-time LLM API costs, detects runaway loops, enforces budgets, audits code risk, and reports usage across multiple providers and models.

0· 105·0 current·0 all-time
byNima Ansari@nimaansari

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for nimaansari/llm-cost-watchdog.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "LLM Cost Watchdog" (nimaansari/llm-cost-watchdog) from ClawHub.
Skill page: https://clawhub.ai/nimaansari/llm-cost-watchdog
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install llm-cost-watchdog

ClawHub CLI

Package manager switcher

npx clawhub@latest install llm-cost-watchdog
Security Scan
Capability signals
CryptoCan make purchasesRequires sensitive credentials
These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
Name/description match what the package implements: pricing loaders, trackers, SDK wrappers, AST audit, visualizer, and budget enforcement. It does not request unrelated credentials or external OS-level access in the registry metadata. The included scripts and references (pricing.md, trackers, audit) are proportionate to the stated feature set.
!
Instruction Scope
SKILL.md and scripts instruct the agent to: tail OpenClaw session JSONL, wrap provider SDK clients (track_openai, track_anthropic, etc.), and optionally install a global httpx capture (install_global_capture()). These mechanisms write to ~/.cost-watchdog/usage.jsonl and errors.jsonl and may record full prompts, completions, and HTTP payloads. Capturing arbitrary httpx traffic can include non-LLM endpoints or secrets in request bodies/headers; the README/tests also reference regenerating pricing via live network calls. Tests expect a LICENSE file that isn't listed in the manifest — a small inconsistency. Overall the actions are explainable for cost-tracking but broaden the data surface (sensitive content may be logged) and grant broad discretion to the skill when enabled.
Install Mechanism
No external install spec or remote downloads are declared; the bundle contains Python scripts and docs. That reduces supply-chain concerns. The README suggests optional symlinking into global skill directories (user action). No use of URL-based extract/install was observed in metadata.
Credentials
No privileged environment variables or external credentials are required by the skill metadata. The skill uses its own env toggles (CW_PRICE_TTL_SECONDS, CW_OFFLINE, CW_STATIC_ONLY, CW_LOG_DIR, CW_BUDGET_USD) which are proportional to caching, offline/test mode, log location, and budget enforcement. It does not ask for unrelated service keys, which is appropriate.
Persistence & Privilege
always:false (not force-included). The skill writes logs to ~/.cost-watchdog by default and exposes wrappers that can patch SDK behavior at runtime (inject stream options, tee iterators). Those changes are expected for a monitoring/enforcement layer but increase the blast radius if the global httpx capture or automatic wrapper injection is enabled—review and restrict usage to controlled environments.
What to consider before installing
This skill appears to implement what it claims and does not request unrelated credentials, but it can capture and log full LLM requests/responses and (if enabled) any Python httpx traffic. Before installing: 1) Review the code paths that enable global capture (install_global_capture, openclaw_tailer, and the SDK wrappers) to ensure you understand what is logged and where. 2) Set CW_LOG_DIR to a directory with restricted permissions and consider encrypting or rotating logs if they may contain secrets. 3) Prefer per-SDK wrappers (track_openai, track_anthropic) over the global httpx capture to reduce unintended data capture. 4) Run in an isolated environment (container or dedicated VM) if you will enable live pricing/network access. 5) If you need deterministic tests or to avoid network lookups, set CW_STATIC_ONLY=1 (the test suite already does this). 6) If you aren’t comfortable with the possibility of recording sensitive payloads, do not enable global capture or automatic tailing; otherwise the skill is coherent with its purpose. Additional code review is recommended if you will deploy this in production or on systems with sensitive data.

Like a lobster shell, security has layers — review code before you run it.

latestvk97cqc40k4qv2ys2abz7tc8jc58583jt
105downloads
0stars
1versions
Updated 1w ago
v1.0.0
MIT-0

Cost Watchdog 💰

Real-time cost tracking layer for LLM-based agents. Prices every call live, detects runaway loops in code, enforces budget ceilings mid-execution.

1. Identity

Observes LLM spend without disturbing the agent. Prevents $2,400-overnight-loop disasters by making cost a first-class concern: priced at write time, budgeted at check time, surfaced in reports.

2. Triggers

Activate when:

  • User mentions cost, budget, tokens, billing.
  • Code contains LLM API calls (Anthropic, OpenAI, OpenRouter, Google, Groq, ...).
  • Agent loops or recursive workflows.
  • Batch / streaming processing with unclear bounds.
  • /cost-watchdog [command] is invoked.

3. Commands

Run via python3 scripts/cost_watchdog.py <cmd> (or hook into your own CLI).

CommandWhat it does
sessionSpend totals from usage.jsonl — calls, tokens, cost, top models.
report24h / 7d / 30d windows with top model per window.
tail [--once]Watch OpenClaw session JSONL and log every assistant turn.
detect [--json]Identify which model the agent is currently using (5 probes).
audit <file.py>AST-based code risk scan: unbounded loops, recursion, missing max_tokens.
price <model>Live pricing for one model, with source + cache age.
estimate <model>Project cost for n iterations of a given call.
alternatives <model>Cheaper same-unit models.
errors [--limit N]Recent swallowed exceptions (silent failures made visible).
validate-tokens <model>Compare our heuristic against provider's authoritative count.
reset [--all]Clear current-day log (--all also clears rolled files).

4. Pricing layer

Source chain

openrouter/*            → OpenRouter API (live)           → static fallback
anything else           → LiteLLM JSON (live, cached 24h) → OpenRouter (permissive) → static fallback
  • 2600+ models indexed across chat, completion, embedding, image, audio, video, rerank, OCR, search modes.
  • 30+ providers in the static fallback: Anthropic, OpenAI, Google, Groq, Mistral, Cohere, DeepSeek, Perplexity, xAI, Bedrock, Azure, and more.
  • Unit-aware: token, image, second, query, page, character, pixel. Alternatives never compare across units.
  • Circuit breaker opens after 3 consecutive network failures for a host; falls through to cache/static until the cool-down ends (60s).

Tuning

Env varDefaultEffect
CW_PRICE_TTL_SECONDS86400 (24h)Cache lifetime. 0 = hit network every call.
CW_OFFLINEunsetIf 1, never touch the network.
CW_STATIC_ONLYunsetIf 1, skip live sources entirely. Used by tests.
CW_LOG_DIR~/.cost-watchdogWhere usage/errors/cache files live.
CW_BUDGET_USDunsetCeiling; wrappers raise BudgetExceeded when crossed.

Refresh static pricing

python3 scripts/refresh_pricing.py

Regenerates references/pricing.md from the live sources so the offline fallback is fresh. Aborts if fewer than 100 rows came back (protects against clobbering on a network outage).

5. Tracking layer — how we know what was spent

Four independent paths, all write to ~/.cost-watchdog/usage.jsonl:

PathWhen to useCovers streams?
openclaw_tailer.py --watchRunning OpenClaw. Zero code changes.yes (reads completed turns)
track_openai(client)You call OpenAI-compatible SDK (covers OpenRouter, Groq, DeepSeek, Mistral, Together, Fireworks, Cerebras, Anyscale, ...).yes (tee'd iterator, auto-injects stream_options={"include_usage": True})
track_anthropic(client)Direct Anthropic SDK.yes (wraps messages.stream())
track_gemini(model) / track_cohere(client) / track_bedrock(client)Direct provider SDKs.no (add wrappers if you need streams)
install_global_capture() (httpx)Any modern Python SDK using httpx.no — streams are flagged into errors.jsonl so the gap is visible. Use the SDK wrappers for stream coverage.

Usage log rotates daily: usage.YYYY-MM-DD.jsonl. session_total(since=...) skips files outside the window before scanning.

Aggregation uses canonical_family() so claude-haiku-4-5-20251001, claude-haiku-4-5, and claude-haiku-4.5 are one row in reports.

6. Budget enforcement

Two mechanisms:

  1. Write-time check (race-safe): append_usage(entry, budget_ceiling=X) takes an fcntl.flock on a sidecar, sums the current session, and refuses the write (raises BudgetExceeded) if the call would cross X.
  2. Post-write check: wrappers compare cumulative spend to CW_BUDGET_USD after logging and raise if over. Used when the wrapper doesn't know the ceiling at call time.

Either path stops the agent mid-loop; the LLM call still returns to the caller, but the next one blocks.

7. Code audit (AST)

python3 scripts/cost_watchdog.py audit path/to/agent.py

Walks the AST and reports:

  • CRITICALwhile True with an LLM call and no max_iterations-style bound.
  • CRITICAL — function that recurses and calls an LLM API with no depth argument.
  • HIGH — plain while that calls an API with no retry/iteration counter.
  • MEDIUM — LLM call missing max_tokens / max_completion_tokens.
  • MEDIUM — function with ≥5 sequential LLM calls (batching candidate).

Every finding has a file line number. No more count('def ') > 3 and count('self.') > 5 → "recursion detected" false positives.

8. Detection — "what model is the agent using?"

python3 scripts/cost_watchdog.py detect

Five probe layers, ranked by confidence:

ProbeConfidence
OpenClaw session JSONLhigh
Claude Code session JSONLhigh
Most recent usage-log entryhigh
Claude Code settings.jsonmedium
Env vars (ANTHROPIC_MODEL, OPENAI_MODEL, ...)medium

Emits a table or --json.

9. Files

PathPurpose
scripts/_pricing.pyRouter: picks LiteLLM / OpenRouter / static per query.
scripts/_sources.pyThree PricingSource classes + disk cache + circuit breaker.
scripts/tokenizer.pyProvider-aware token counting (tiktoken for OpenAI; calibrated heuristics for others).
scripts/model_canon.pycanonical_family() — collapses model variants.
scripts/code_audit.pyAST cost-risk walker.
scripts/usage_log.pyJSONL writer + rotation + aggregation.
scripts/tracker.pySDK wrappers + streaming + budget enforcement.
scripts/http_capture.pyinstall_global_capture() — httpx transport hook.
scripts/openclaw_tailer.pyWatches OpenClaw sessions.
scripts/detect_model.pyMulti-layer detector.
scripts/errors.pyerrors.jsonl writer + reader.
scripts/io_utils.pywrite_json_atomic / read_json.
scripts/refresh_pricing.pyRegenerates static pricing.md from live sources.
scripts/cost_watchdog.pyUnified CLI dispatcher.
references/pricing.mdStatic fallback (regenerated; ~2600 models).
tests/test_cost_watchdog.py73 tests: router, cache, AST, tokenizer, rotation, cassettes, circuit breaker, canonicalization.

10. Quality checklist

  • Live pricing from LiteLLM + OpenRouter, 24h-cached, with static fallback.
  • Exact-match model lookup (no substring conflation).
  • Multi-modal (token / image / second / query / page / character).
  • Unit-aware alternatives (never compares tokens to images).
  • AST-based code audit with line numbers.
  • Provider-aware tokenization (no more tiktoken-for-Claude).
  • Variance-based confidence (no += 0.05 theater).
  • Atomic writes to all shared state files.
  • fcntl.flock-guarded budget check-and-log (no race).
  • Circuit breaker on flaky networks (no 5s hang per call).
  • Streaming capture via SDK wrappers; streams flagged in errors.jsonl via HTTP capture.
  • Daily log rotation + date-scoped aggregation.
  • Canonical model families (variants collapse in reports).
  • errors.jsonl surfaces silent failures; cost_watchdog errors shows them.
  • Cassette tests for LiteLLM + OpenRouter parse paths (schema-drift safety net).
  • 73 logic tests passing.

11. Known limits (be honest)

  • Tokenizer heuristics for Claude/Gemini/etc. are calibrated from docs, not measured. Run cost_watchdog validate-tokens <model> to check drift against the provider's authoritative count when you have an API key.
  • install_global_capture() can't see streaming responses — httpx exposes an empty body until the user reads the stream. Use track_openai / track_anthropic for stream coverage; http_capture logs skipped streams to errors.jsonl so the gap is visible.
  • Non-httpx SDKs (older Cohere, boto3 with custom transport) need the per-SDK wrappers — HTTP capture won't see them.
  • LiteLLM community data can lag 24-48h on brand-new models. OpenRouter's API is truly live for anything it routes.

12. Testing

python3 -m unittest tests.test_cost_watchdog     # 73 tests
python3 scripts/code_audit.py test_risky_code.py # sample risks
python3 scripts/cost_watchdog.py report          # current spend summary

Comments

Loading comments...