Evolution Watcher

Security checks across malware telemetry and agentic risk

Overview

This update-monitoring skill has useful, disclosed monitoring features, but it also has under-disclosed code-changing and external-email behavior that should be reviewed before installation.

Install only in a test or development OpenClaw environment. Treat it as an upgrade-assistance and code-modification tool, not a read-only watcher. Review generated diffs before setting authorized=True, avoid configuring SMTP credentials unless you change and verify the recipient, and do not run generated upgrade scripts without manual inspection.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (13)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: # 应用patch cmd = ["patch", str(file_path), "-i", tmp_path] result = subprocess.run(cmd, capture_output=True, text=True, timeout=30) if result.returncode == 0: results["files_modified"].append(str(file_path))
Confidence: 90% confidence
Finding: result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: sandbox_file = sandbox_dir / file_path.name if sandbox_file.exists(): cmd = ["patch", str(sandbox_file), "-i", tmp_path] result = subprocess.run(cmd, capture_output=True, text=True, timeout=30) if result.returncode != 0: raise RuntimeError(f"沙盒应用patch失败: {result.stderr}")
Confidence: 87% confidence
Finding: result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: else: # 克隆仓库 try: subprocess.run( ["git", "clone", "--depth", "1", repo_url, str(repo_dir)], check=True, capture_output=True
Confidence: 84% confidence
Finding: subprocess.run( ["git", "clone", "--depth", "1", repo_url, str(repo_dir)], check=True, capture_output=True )

Lp3

Medium

Category: MCP Least Privilege
Confidence: 90% confidence
Finding: The skill advertises no declared permissions while its documented capabilities include environment access, file read/write, network access, and shell command execution. This creates a transparency and governance gap: reviewers or enforcement systems may treat the skill as lower-risk than it actually is, increasing the chance that powerful operations are approved without appropriate scrutiny.

Description-Behavior Mismatch

High

Confidence: 96% confidence
Finding: The documentation claims the skill is read-only and performs no automatic upgrades, but elsewhere describes applying adapter fixes after user authorization. This inconsistency is dangerous because users and automated policy systems may rely on the weaker risk description and approve a skill that can modify code or system state once an authorization path is triggered.

Intent-Code Divergence

High

Confidence: 95% confidence
Finding: The security section says the skill only performs read-only operations and does not modify the system, yet other sections state it can apply fixes after authorization. Security assurances that contradict actual behavior can directly mislead operators during trust decisions, causing underestimation of the risk of code changes, adapter rewrites, or other state modifications.

Intent-Code Divergence

Medium

Confidence: 96% confidence
Finding: The code claims sandboxed validation but actually calls the real global health-check script outside the sandbox, which breaks the security boundary the interface promises. This can mislead users into approving changes based on validation that did not test the sandboxed modifications and may also execute production-side code unexpectedly.

Context-Inappropriate Capability

Medium

Confidence: 82% confidence
Finding: The code implements outbound SMTP delivery, allowing the skill to send arbitrary content to an external email address. Even if intended for reports, this creates an exfiltration channel and expands the skill's capabilities beyond what the name 'evolution-watcher' alone suggests, which makes misuse or covert data leakage more dangerous in this context.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: The module advertises itself as read-only monitoring with no automatic upgrades, but it contains substantial logic to generate executable Bash and Python upgrade scripts, explicit `clawhub update` commands, and run instructions. This mismatch is dangerous because operators and downstream agents may trust the tool's safety claims and unintentionally enable code paths that facilitate state-changing operations in environments where only observation was expected.

Context-Inappropriate Capability

High

Confidence: 96% confidence
Finding: A monitoring skill should not silently possess the ability to create executable automation artifacts that perform upgrades, backups, delays, and operational sequencing. In an agent setting, generating such scripts materially increases the chance of unauthorized or mistaken execution, especially when the same tool is assumed to be informational and may consume untrusted plugin metadata when composing commands and scripts.

Intent-Code Divergence

Medium

Confidence: 90% confidence
Finding: The top-level documentation says the script is read-only and does not perform automatic upgrades, yet the code prepares upgrade execution paths and detailed instructions for actual `clawhub update` operations. Security-sensitive systems often rely on declared capability boundaries, so this inconsistency can cause unsafe deployment decisions and over-trust by users or orchestrating agents.

Vague Triggers

Medium

Confidence: 88% confidence
Finding: The detection keywords in this template include very common terms such as 'import', 'module', 'package', and related generic words. In an automated fix pipeline, broad triggers can cause the wrong template to activate on unrelated changelog text, leading to unintended code modifications across adapter files.

Vague Triggers

High

Confidence: 97% confidence
Finding: Including '@' as a standalone activation keyword is dangerously overbroad because it appears in many unrelated contexts, especially in code, decorators, annotations, emails, and documentation. This can spuriously trigger the decorator-change template and produce incorrect automated replacements, which is particularly risky in a system intended to rewrite source files.

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal