skills-monitor

Security checks across malware telemetry and agentic risk

Overview

This is a legitimate-looking monitoring platform, but it needs Review because it can run other skills unsandboxed, persist scheduled uploads, transmit telemetry, and overstates some credential-storage protections.

Install only if you are comfortable with a broad monitoring tool that can execute installed skills, store an agent API key, run local/server dashboards, and upload reports. Prefer using it in a sandbox or dedicated environment, avoid enabling scheduled uploads until you verify the destination and payloads, do not rely on the fallback credential store as strong encryption, and review the bundled server authentication before exposing it beyond localhost.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (72)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
cmd.append(str(v))

        try:
            result = subprocess.run(
                cmd,
                capture_output=True,
                text=True,
Confidence
96% confidence
Finding
result = subprocess.run( cmd, capture_output=True, text=True, timeout=120, cwd=self.info.dir_path,

subprocess module call

Medium
Category
Dangerous Code Execution
Content
"""直接运行并传入 symbol"""
        entry_path = os.path.join(self.info.dir_path, self.info.entry_file)
        try:
            result = subprocess.run(
                [sys.executable, entry_path, symbol],
                capture_output=True,
                text=True,
Confidence
93% confidence
Finding
result = subprocess.run( [sys.executable, entry_path, symbol], capture_output=True, text=True, timeout=60, c

Dynamic attribute access via getattr()

Low
Category
Dangerous Code Execution
Content
# 查找目标函数
            func = None
            if task_name and hasattr(module, task_name):
                func = getattr(module, task_name)
            elif hasattr(module, "main"):
                func = module.main
            elif hasattr(module, "run"):
Confidence
82% confidence
Finding
func = getattr(module, task_name)

os.system() or os exec-family call

High
Category
Dangerous Code Execution
Content
cmd = f"python3 {PROJECT_ROOT / 'skills_monitor_web.py'} --port {port} {demo_flag} {debug_flag}"
    print(f"🚀 启动 Web 面板: http://127.0.0.1:{port}")
    os.system(cmd.strip())


def cmd_upload(args):
Confidence
97% confidence
Finding
os.system(cmd.strip())

Lp3

Medium
Category
MCP Least Privilege
Confidence
92% confidence
Finding
The skill advertises capabilities that imply environment access, filesystem read/write, shell execution, and network communication, but the manifest shown in the documentation does not declare permissions or clearly constrain them. In a monitoring skill that can run other skills, upload data, and start a server, undeclared capabilities materially increase the risk of overbroad access and user surprise.

Tp4

High
Category
MCP Tool Poisoning
Confidence
97% confidence
Finding
The documented behavior goes well beyond passive benchmarking: it includes telemetry upload, credential handling, a Flask server, WeChat integrations, push notifications, infrastructure setup, scheduling, and persistent data management. This mismatch is dangerous because users may install a seemingly benign evaluation tool without realizing it can expose services, transmit data externally, and manage sensitive identity material.

Description-Behavior Mismatch

Medium
Confidence
95% confidence
Finding
The skill presents itself primarily as a monitoring/evaluation platform, but it also supports remote agent registration and outbound upload of reports to an external server. Even if this is a legitimate feature, the mismatch increases security risk because users may not expect network exfiltration of local telemetry, agent identifiers, or API-key-associated data from a monitoring tool.

Description-Behavior Mismatch

Medium
Confidence
93% confidence
Finding
The diagnose flow can send diagnostic summaries to an external WeCom webhook, which is outbound data transmission beyond pure local monitoring/evaluation. In a security-sensitive context, hidden or under-disclosed reporting channels are dangerous because diagnostic content may include operational details, environment state, or other sensitive metadata.

Context-Inappropriate Capability

High
Confidence
99% confidence
Finding
The settings update API authenticates the target user solely by an openid supplied in the JSON body, with no session, signature, token, or ownership check. An attacker who knows or can guess another user's openid can change that user's notification preferences, which is an insecure direct object reference affecting account settings integrity.

Context-Inappropriate Capability

Medium
Confidence
95% confidence
Finding
The template embeds `user.openid` into client-side JavaScript and then sends it back in the settings update request, making the account identifier client-controlled. If the backend trusts this field, an attacker can modify the request and change another user's push settings by supplying a different OpenID, resulting in insecure direct object reference/account targeting.

Context-Inappropriate Capability

Medium
Confidence
93% confidence
Finding
The page loads Chart.js directly from a third-party CDN, which creates a supply-chain and privacy risk: anyone viewing the dashboard must fetch and execute remote JavaScript outside the application's control. If the CDN asset is compromised, blocked, or substituted, the overview page could execute attacker-controlled code in users' browsers.

Context-Inappropriate Capability

Medium
Confidence
98% confidence
Finding
The general runner can execute arbitrary skill entrypoints either by subprocess or dynamic import, which means the monitoring platform is effectively a code-execution engine for untrusted third-party skills. In this product context, that materially increases danger because evaluating or monitoring a skill should not require unsandboxed execution with host access.

Context-Inappropriate Capability

Medium
Confidence
99% confidence
Finding
The CLI runner passes env={**os.environ, "PYTHONPATH": self.info.dir_path}, forwarding the full parent environment into untrusted skill code. This can leak API keys, tokens, credentials, and other sensitive host configuration to a skill that only needs a minimal runtime context.

Description-Behavior Mismatch

Medium
Confidence
88% confidence
Finding
The module's stated role is monitoring/evaluation, but it also performs automatic report exfiltration to a central server and sets up recurring background execution. That mismatch increases the chance users deploy the skill without understanding that it creates ongoing data transfer and persistence, which is a security-relevant transparency failure.

Context-Inappropriate Capability

Medium
Confidence
93% confidence
Finding
The code installs and manages a macOS LaunchAgent, which is an OS-level persistence mechanism. Even if intended for benign scheduling, persistence materially raises security risk because it enables ongoing background activity after initial execution and is not obviously necessary from the skill description alone.

Description-Behavior Mismatch

Medium
Confidence
89% confidence
Finding
This module introduces persistent identity, API key generation, lifecycle management, and consent tracking that exceed the stated monitoring/evaluation platform purpose. Scope-expanding credential and identity management code increases the attack surface and creates opportunities for misuse or unauthorized persistence of identifiers and secrets on the local system.

Context-Inappropriate Capability

Medium
Confidence
88% confidence
Finding
The code manages API credentials and stores them in an OS keychain/secure store despite the skill being described as a monitoring/evaluation platform. Even if not overtly malicious, unnecessary credential handling is dangerous because it grants the skill the ability to create, persist, and later retrieve secrets beyond what users would reasonably expect from the declared functionality.

Description-Behavior Mismatch

Medium
Confidence
89% confidence
Finding
The module advertises non-intrusive metric collection but also performs realtime reporting to an external reporter, which expands data flow beyond local storage. In a monitoring/interceptor context, hidden or insufficiently disclosed outbound telemetry can expose execution metadata and potentially sensitive operational details to another sink without clear consent boundaries.

Description-Behavior Mismatch

Medium
Confidence
86% confidence
Finding
The scheduler installs automatic reporting to a central server and defaults the prompt toward enabling daily uploads, which creates an ongoing telemetry channel from the user's environment. Even though consent is requested interactively, the skill metadata does not clearly disclose this behavior, and the code delegates reporting behavior to AutoReporter without proving scope minimization here.

Intent-Code Divergence

Medium
Confidence
88% confidence
Finding
The consent text promises that code, files, credentials, and skill input/output data are never collected, but this module does not technically enforce those guarantees before installing or triggering a generic reporting component. If AutoReporter later includes broader payloads, users may be misled into consenting under false assumptions, creating privacy and trust violations.

Intent-Code Divergence

High
Confidence
99% confidence
Finding
The module documentation claims the fallback uses Fernet symmetric encryption, but the implementation actually uses a homemade XOR-based scheme. This is dangerous because callers may trust the fallback as strong credential protection when, in reality, compromise of the local key file or cryptanalytic weaknesses in the custom design can expose all stored secrets.

Intent-Code Divergence

High
Confidence
99% confidence
Finding
The class docstring states that fallback storage transparently switches to Fernet encryption, but the code uses a custom XOR construction instead. In a credential store, this mismatch materially increases risk because operators may rely on security properties like authenticated encryption and tamper resistance that are not actually provided.

Intent-Code Divergence

Medium
Confidence
98% confidence
Finding
The encryption docstring describes the XOR+base64 approach as suitable for secure local storage, but custom XOR encryption is not a safe substitute for standard credential encryption. This can mislead maintainers into preserving an insecure design and increases the chance that sensitive API keys are stored with weaker protection than expected.

Description-Behavior Mismatch

Medium
Confidence
93% confidence
Finding
This monitoring/uploader module includes an additional capability to generate a WeChat account-binding QR code, which is outside the stated purpose of telemetry upload and expands the trust boundary. Hidden or undocumented account-linking functionality increases the risk of unauthorized identity linking, phishing-style workflows, or backend abuse if the server or configuration is compromised.

Intent-Code Divergence

Medium
Confidence
89% confidence
Finding
The consent text claims the skill does not collect personal information, yet the module clearly handles agent-linked identity data via agent_id, identity configuration export, audit logs, and credential deletion paths. This is a transparency and informed-consent failure: users may make privacy decisions based on inaccurate statements, causing undisclosed processing of identifying metadata.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal