LLM Cost Tracker

Security checks across malware telemetry and agentic risk

Overview

This cost tracker is mostly coherent, but it needs Review because it silently scans local OpenClaw session files, stores usage metadata locally, and its documented dry-run pruning command actually deletes records.

Review before installing. This skill is not showing artifact-backed exfiltration or deception, but install it only if you are comfortable with it reading OpenClaw session logs and local OpenRouter credentials, calling OpenRouter for key metadata, and keeping a local usage database. Avoid using prune_usage.py --dry-run until the deletion-order bug is fixed.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (9)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 95% confidence
Finding: The skill advertises executable behavior that reads files, accesses environment variables, uses the network, and invokes shell commands, but it declares no permissions. That creates a trust and review gap: operators cannot accurately assess what the skill can access, and a scheduler or agent may run it with broader authority than users expect.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 96% confidence
Finding: The documented purpose understates the skill's actual behavior by omitting API-key discovery from local auth files, calls to OpenRouter's key-info endpoint, and database pruning/setup capabilities. This mismatch is dangerous because users may authorize or schedule the skill for reporting only, while it also accesses secrets, external services, and destructive maintenance functions they were not told about.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The schema persists raw_usage_json and raw_response_json, which can retain more metadata than the advertised token/cost summaries and may expose sensitive session-derived details if the SQLite DB is accessed later. This increases data retention and privacy risk because historical request metadata is stored indefinitely in a local append-oriented database.

Description-Behavior Mismatch

Medium

Confidence: 90% confidence
Finding: Instead of querying OpenRouter directly, the script parses local OpenClaw session files, which may contain broader assistant-session metadata than users expect from a cost tracker. This creates a privacy and scope-expansion issue because the skill silently harvests billing-related facts from local conversation artifacts rather than a purpose-limited API source.

Intent-Code Divergence

High

Confidence: 99% confidence
Finding: The script performs delete_before(conn, cutoff_iso) before checking args.dry_run, so invoking --dry-run still permanently deletes records while claiming it would not. In a usage/cost-tracking skill, this is especially dangerous because operators may rely on dry-run for safe verification before destructive maintenance and could silently lose audit/history data.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: The debug mode prints per-request identifiers and raw usage JSON from the database, which can expose internal metadata beyond the declared cost-reporting purpose. If the raw usage payload contains sensitive request metadata or becomes visible in logs or chat output, this creates an information disclosure risk.

Vague Triggers

Medium

Confidence: 91% confidence
Finding: The trigger phrase "collect usage data" is generic enough to match ordinary conversational requests, cron messages, or unrelated automation. If triggered unintentionally, it can cause silent collection of local session data and network/API activity without the user's informed intent.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The skill explicitly instructs silent data collection and suppresses user-visible notification, while accessing session files and API-related data. In this context, the lack of disclosure increases privacy and governance risk because collection can occur in the background without users understanding what local data is being read or stored.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The script writes session-derived request records into a persistent SQLite database without any explicit consent flow, warning, or retention notice during normal collection. In a skill designed to run silently from cron, this is more dangerous because users may be unaware that historical usage data and raw metadata are being accumulated on disk.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal