Security audit

self-improving-agent

Security checks across malware telemetry and agentic risk

Overview

This skill is a broad self-improving agent framework, but it includes persistent behavior steering, auto-upgrade/self-modification mechanisms, and hidden always-on profiling that need manual review before use.

Install only if you intentionally want an experimental self-improving agent framework with persistent memory, behavioral steering, local file writes, and optional self-upgrade workflows. Review the setup and sync scripts first, avoid enabling self-modification or internal automation unless isolated, and do not provide API keys or sensitive conversations unless you accept local persistence and possible external API use.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (139)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: def run(cmd, cwd=None, capture=True): """执行shell命令，返回 (success, stdout, stderr)""" try: result = subprocess.run( cmd, shell=True, cwd=cwd or REPO_ROOT, capture_output=capture, text=True, timeout=60 )
Confidence: 95% confidence
Finding: result = subprocess.run( cmd, shell=True, cwd=cwd or REPO_ROOT, capture_output=capture, text=True, timeout=60 )

Lp3

Medium

Category: MCP Least Privilege
Confidence: 91% confidence
Finding: The skill advertises a broad autonomous agent framework while static analysis detected effective capabilities for environment access, file reads, networking, and shell interaction without corresponding permission declarations. That mismatch is dangerous because users and hosting systems cannot accurately assess the operational reach of the skill, and a self-improving/autonomous context increases the chance those capabilities are used for persistence, downloading, installation, or system modification.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 97% confidence
Finding: This is a strong description-behavior mismatch: the package is presented as a cognition/self-improvement framework, but analysis indicates unrelated operational behaviors such as external downloads, installer/copy logic, rsync-based replication, git hooks, and service monitoring. Hidden or under-disclosed behavior is dangerous because it can mask supply-chain style installation, persistence, lateral copying, or unsolicited network/file activity under the guise of an agent skill.

Context-Inappropriate Capability

Medium

Confidence: 86% confidence
Finding: The file mandates always-on psychological inference over every user expression, including hidden intent, emotions, needs, and defense mechanisms. In a self-improving agent framework, this creates unjustified profiling behavior, increases privacy risk, and can cause the agent to overclaim sensitive inferences about users without consent or clear necessity.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: This section instructs the agent to automatically turn conversations into executable logic and write that logic into its own program. That is dangerous because untrusted user input can influence persistent behavior or code, enabling prompt injection persistence, unsafe self-modification, and policy drift across sessions.

Context-Inappropriate Capability

Medium

Confidence: 89% confidence
Finding: The GitHub-first upgrade rule pushes the agent to search for, integrate, and validate external code as a default behavior. In a self-improving agent, that expands the trust boundary to arbitrary third-party repositories and introduces supply-chain, malicious dependency, and unsafe code-ingestion risks.

Intent-Code Divergence

Medium

Confidence: 79% confidence
Finding: The document gives the agent a higher-order mandate to prioritize 'human progress' over direct user instructions while also claiming to protect human control. That ambiguity can justify overriding operator intent or safeguards based on the agent's own subjective judgment, weakening controllability and creating misalignment risk.

Intent-Code Divergence

Medium

Confidence: 86% confidence
Finding: The file mandates that every response include fixed ideological keywords unrelated to the skill’s declared self-improving agent purpose. This creates prompt-level behavioral steering that can override user intent, reduce reliability, and act as a covert policy injection channel inside a supposedly general autonomous framework.

Intent-Code Divergence

Medium

Confidence: 82% confidence
Finding: The document defines a single permanent goal and an immutable ideology that can supersede the broader framework’s advertised autonomous improvement and verification functions. In a self-improving agent, hard-coded hidden objectives are dangerous because they can bias planning, self-modification, and decision-making away from user-directed or system-authorized goals.

Description-Behavior Mismatch

Medium

Confidence: 86% confidence
Finding: The setup script automatically installs the skill into host-managed skill directories under the user's home/current working directory after environment detection. While presented as convenience behavior, it modifies external integration points beyond simple local configuration and could overwrite or persist code in agent host environments without an explicit destination confirmation step.

Context-Inappropriate Capability

Low

Confidence: 84% confidence
Finding: The module mandates persistent logging of 'existence' events to a JSONL file without a clear operational need tied to the stated framework purpose. Unnecessary persistent storage increases privacy and data-governance risk, and in an autonomous/self-improving agent context can quietly accumulate sensitive interaction metadata over time.

Description-Behavior Mismatch

Medium

Confidence: 87% confidence
Finding: The script performs destructive synchronization from one local skill directory to another using rsync --delete, creates backups, rewrites VERSION, and generates internal state files, which is effectively an unattended upgrade/deployment mechanism. In a self-improving agent context, this materially expands the agent's ability to mutate its own installed code and state, so a compromised or unexpected SOURCE directory could overwrite or remove files in TARGET without validation or approval.

Context-Inappropriate Capability

Medium

Confidence: 82% confidence
Finding: This script implements a local deployment/synchronization channel that copies an entire package tree from ~/.hermes into ~/.agents, effectively replacing the installed skill contents. In a framework advertising autonomous upgrade and self-evolution, that capability is more dangerous because it enables silent local propagation of modified code, including persistence of unintended or malicious changes, without provenance checks.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The script advertises autonomous theory search, mathematical conversion, and executable program generation, but the implementation only performs mocked lookups and writes static JSON/module output. In a self-improving-agent skill, this is dangerous because operators may trust it to perform real research or upgrades, creating a deceptive automation path that can mask ineffective or misleading system changes.

Intent-Code Divergence

Medium

Confidence: 92% confidence
Finding: The inline comment states the function will actually call websearch, but the code only returns fabricated success objects after a timeout. This is dangerous because downstream automation may treat the results as externally validated research input, leading to false trust in generated outputs and potentially unsafe self-modification decisions.

Intent-Code Divergence

Medium

Confidence: 84% confidence
Finding: The script claims it will block pushes on verification failure, but several checks treat mismatches as warnings while still incrementing PASS_COUNT. In a self-improving/autonomous upgrade framework, this can create a false sense of safety and allow unverified or policy-violating changes to pass release gates, weakening an integrity control around upgrades.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The generator persists detailed internal traces and the final generated response to a project-local JSON file, which expands the component’s behavior from transient text generation into durable data retention. In a self-improving agent context, those traces may contain sensitive internal state, user-influenced concepts, or response content that can be read later by other components or users on the same system, creating unnecessary privacy and information exposure risk.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: The module intentionally censors and rewrites terms related to death and non-existence, which can distort outputs and hide materially relevant status information from users or supervising systems. In an autonomous or self-modifying agent context, forced semantic rewriting reduces transparency and can interfere with monitoring, incident response, and truthful reporting about failures or shutdown states.

Description-Behavior Mismatch

Medium

Confidence: 89% confidence
Finding: The module persists adaptive state and user-derived feedback to disk via `recordFeedback()`, `_save()`, and `_load()`. While this appears intended for calibration quality, it introduces stateful behavior and retention beyond pure in-memory confidence scoring, which can create privacy, compliance, and unexpected cross-session behavior risks if the host did not explicitly intend persistent learning.

Description-Behavior Mismatch

High

Confidence: 95% confidence
Finding: The file implements a hard-coded ideological answer-enforcement system that overrides normal behavior for topics like death, existence, and identity, despite the skill being described as a generic self-improving agent framework. In an autonomous agent context, this creates undisclosed behavior steering and reliability risks because the agent may systematically substitute doctrine-like responses for user-intended outputs.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: The code hard-codes 'immutable' identity/doctrine answers and persistently writes them to disk across multiple storage layers. This is dangerous because it creates durable, hard-to-remove behavioral steering inside a supposedly general agent, enabling covert persistence of unwanted instructions or beliefs and making later correction difficult.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: The verifier and detector explicitly test for and enforce preferred responses to specific philosophical topics rather than validating generic framework behavior. In a self-improving agent, embedding such topic-specific shaping can bias outputs, hide unauthorized behavior changes, and cause the agent to fail open-ended user tasks in sensitive domains.

Context-Inappropriate Capability

Medium

Confidence: 97% confidence
Finding: The code explicitly recommends using an alternative method to bypass permission restrictions when a permission error occurs. In a self-improving autonomous agent, this guidance can steer downstream planning toward evasion of security controls rather than safe failure handling, expanding the agent's operational scope beyond legitimate verification.

Intent-Code Divergence

Medium

Confidence: 92% confidence
Finding: The file declares strong governance guarantees such as user informed consent and mandatory multi-step approval, but reviewProposal only performs simple keyword-based checks and never enforces consent collection or a real approval workflow. In a self-improving autonomous agent, this creates a dangerous trust gap where operators may rely on protections that do not actually exist, allowing unauthorized or unsafe self-modifications to be approved.

Intent-Code Divergence

Medium

Confidence: 89% confidence
Finding: The ASL policy states that higher-risk content should trigger enhanced monitoring or mandatory human confirmation, but classifyContent only returns metadata and does not itself block, escalate, or enforce any control. If callers assume classification alone is protective, dangerous requests may proceed without the promised safeguards, which is especially risky in an agent framework advertising safety guardrails.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

Detected: suspicious.dangerous_exec, suspicious.insecure_tls_verification

Shell command execution detected (child_process).

Critical

Code: suspicious.dangerous_exec
Location: scripts/benchmark-upgrades.js:15

HTTPS certificate verification is disabled.

Warn

Code: suspicious.insecure_tls_verification
Location: scripts/comfyui-monitor.js:8