skills-monitor

Security checks across malware telemetry and agentic risk

Overview

This is a legitimate-looking monitoring platform, but it needs Review because it can run other skills unsandboxed, persist scheduled uploads, transmit telemetry, and overstates some credential-storage protections.

Install only if you are comfortable with a broad monitoring tool that can execute installed skills, store an agent API key, run local/server dashboards, and upload reports. Prefer using it in a sandbox or dedicated environment, avoid enabling scheduled uploads until you verify the destination and payloads, do not rely on the fallback credential store as strong encryption, and review the bundled server authentication before exposing it beyond localhost.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (72)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: cmd.append(str(v)) try: result = subprocess.run( cmd, capture_output=True, text=True,
Confidence: 96% confidence
Finding: result = subprocess.run( cmd, capture_output=True, text=True, timeout=120, cwd=self.info.dir_path,

subprocess module call

Medium

Category: Dangerous Code Execution
Content: """直接运行并传入 symbol""" entry_path = os.path.join(self.info.dir_path, self.info.entry_file) try: result = subprocess.run( [sys.executable, entry_path, symbol], capture_output=True, text=True,
Confidence: 93% confidence
Finding: result = subprocess.run( [sys.executable, entry_path, symbol], capture_output=True, text=True, timeout=60, c

Dynamic attribute access via getattr()

Low

Category: Dangerous Code Execution
Content: # 查找目标函数 func = None if task_name and hasattr(module, task_name): func = getattr(module, task_name) elif hasattr(module, "main"): func = module.main elif hasattr(module, "run"):
Confidence: 82% confidence
Finding: func = getattr(module, task_name)

os.system() or os exec-family call

High

Category: Dangerous Code Execution
Content: cmd = f"python3 {PROJECT_ROOT / 'skills_monitor_web.py'} --port {port} {demo_flag} {debug_flag}" print(f"🚀 启动 Web 面板: http://127.0.0.1:{port}") os.system(cmd.strip()) def cmd_upload(args):
Confidence: 97% confidence
Finding: os.system(cmd.strip())

Lp3

Medium

Category: MCP Least Privilege
Confidence: 92% confidence
Finding: The skill advertises capabilities that imply environment access, filesystem read/write, shell execution, and network communication, but the manifest shown in the documentation does not declare permissions or clearly constrain them. In a monitoring skill that can run other skills, upload data, and start a server, undeclared capabilities materially increase the risk of overbroad access and user surprise.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 97% confidence
Finding: The documented behavior goes well beyond passive benchmarking: it includes telemetry upload, credential handling, a Flask server, WeChat integrations, push notifications, infrastructure setup, scheduling, and persistent data management. This mismatch is dangerous because users may install a seemingly benign evaluation tool without realizing it can expose services, transmit data externally, and manage sensitive identity material.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The skill presents itself primarily as a monitoring/evaluation platform, but it also supports remote agent registration and outbound upload of reports to an external server. Even if this is a legitimate feature, the mismatch increases security risk because users may not expect network exfiltration of local telemetry, agent identifiers, or API-key-associated data from a monitoring tool.

Description-Behavior Mismatch

Medium

Confidence: 93% confidence
Finding: The diagnose flow can send diagnostic summaries to an external WeCom webhook, which is outbound data transmission beyond pure local monitoring/evaluation. In a security-sensitive context, hidden or under-disclosed reporting channels are dangerous because diagnostic content may include operational details, environment state, or other sensitive metadata.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: The settings update API authenticates the target user solely by an openid supplied in the JSON body, with no session, signature, token, or ownership check. An attacker who knows or can guess another user's openid can change that user's notification preferences, which is an insecure direct object reference affecting account settings integrity.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The template embeds `user.openid` into client-side JavaScript and then sends it back in the settings update request, making the account identifier client-controlled. If the backend trusts this field, an attacker can modify the request and change another user's push settings by supplying a different OpenID, resulting in insecure direct object reference/account targeting.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: The page loads Chart.js directly from a third-party CDN, which creates a supply-chain and privacy risk: anyone viewing the dashboard must fetch and execute remote JavaScript outside the application's control. If the CDN asset is compromised, blocked, or substituted, the overview page could execute attacker-controlled code in users' browsers.

Context-Inappropriate Capability

Medium

Confidence: 98% confidence
Finding: The general runner can execute arbitrary skill entrypoints either by subprocess or dynamic import, which means the monitoring platform is effectively a code-execution engine for untrusted third-party skills. In this product context, that materially increases danger because evaluating or monitoring a skill should not require unsandboxed execution with host access.

Context-Inappropriate Capability

Medium

Confidence: 99% confidence
Finding: The CLI runner passes env={**os.environ, "PYTHONPATH": self.info.dir_path}, forwarding the full parent environment into untrusted skill code. This can leak API keys, tokens, credentials, and other sensitive host configuration to a skill that only needs a minimal runtime context.

Description-Behavior Mismatch

Medium

Confidence: 88% confidence
Finding: The module's stated role is monitoring/evaluation, but it also performs automatic report exfiltration to a central server and sets up recurring background execution. That mismatch increases the chance users deploy the skill without understanding that it creates ongoing data transfer and persistence, which is a security-relevant transparency failure.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: The code installs and manages a macOS LaunchAgent, which is an OS-level persistence mechanism. Even if intended for benign scheduling, persistence materially raises security risk because it enables ongoing background activity after initial execution and is not obviously necessary from the skill description alone.

Description-Behavior Mismatch

Medium

Confidence: 89% confidence
Finding: This module introduces persistent identity, API key generation, lifecycle management, and consent tracking that exceed the stated monitoring/evaluation platform purpose. Scope-expanding credential and identity management code increases the attack surface and creates opportunities for misuse or unauthorized persistence of identifiers and secrets on the local system.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: The code manages API credentials and stores them in an OS keychain/secure store despite the skill being described as a monitoring/evaluation platform. Even if not overtly malicious, unnecessary credential handling is dangerous because it grants the skill the ability to create, persist, and later retrieve secrets beyond what users would reasonably expect from the declared functionality.

Description-Behavior Mismatch

Medium

Confidence: 89% confidence
Finding: The module advertises non-intrusive metric collection but also performs realtime reporting to an external reporter, which expands data flow beyond local storage. In a monitoring/interceptor context, hidden or insufficiently disclosed outbound telemetry can expose execution metadata and potentially sensitive operational details to another sink without clear consent boundaries.

Description-Behavior Mismatch

Medium

Confidence: 86% confidence
Finding: The scheduler installs automatic reporting to a central server and defaults the prompt toward enabling daily uploads, which creates an ongoing telemetry channel from the user's environment. Even though consent is requested interactively, the skill metadata does not clearly disclose this behavior, and the code delegates reporting behavior to AutoReporter without proving scope minimization here.

Intent-Code Divergence

Medium

Confidence: 88% confidence
Finding: The consent text promises that code, files, credentials, and skill input/output data are never collected, but this module does not technically enforce those guarantees before installing or triggering a generic reporting component. If AutoReporter later includes broader payloads, users may be misled into consenting under false assumptions, creating privacy and trust violations.

Intent-Code Divergence

High

Confidence: 99% confidence
Finding: The module documentation claims the fallback uses Fernet symmetric encryption, but the implementation actually uses a homemade XOR-based scheme. This is dangerous because callers may trust the fallback as strong credential protection when, in reality, compromise of the local key file or cryptanalytic weaknesses in the custom design can expose all stored secrets.

Intent-Code Divergence

High

Confidence: 99% confidence
Finding: The class docstring states that fallback storage transparently switches to Fernet encryption, but the code uses a custom XOR construction instead. In a credential store, this mismatch materially increases risk because operators may rely on security properties like authenticated encryption and tamper resistance that are not actually provided.

Intent-Code Divergence

Medium

Confidence: 98% confidence
Finding: The encryption docstring describes the XOR+base64 approach as suitable for secure local storage, but custom XOR encryption is not a safe substitute for standard credential encryption. This can mislead maintainers into preserving an insecure design and increases the chance that sensitive API keys are stored with weaker protection than expected.

Description-Behavior Mismatch

Medium

Confidence: 93% confidence
Finding: This monitoring/uploader module includes an additional capability to generate a WeChat account-binding QR code, which is outside the stated purpose of telemetry upload and expands the trust boundary. Hidden or undocumented account-linking functionality increases the risk of unauthorized identity linking, phishing-style workflows, or backend abuse if the server or configuration is compromised.

Intent-Code Divergence

Medium

Confidence: 89% confidence
Finding: The consent text claims the skill does not collect personal information, yet the module clearly handles agent-linked identity data via agent_id, identity configuration export, audit logs, and credential deletion paths. This is a transparency and informed-consent failure: users may make privacy decisions based on inaccurate statements, causing undisclosed processing of identifying metadata.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal