self-improving-agent

Security checks across malware telemetry and agentic risk

Overview

The skill presents as self-improvement but includes under-disclosed autonomous code modification, persistent memory/profiling, and external-code integration paths that deserve review before installation.

Install only in a sandboxed or experimental environment where you intentionally want a self-modifying agent framework. Review and disable automatic upgrades, source writes, persistent memory logging, psychological profiling, and any GitHub/external-code ingestion before use. Do not grant broad shell, network, or repository write access unless you can audit and approve each change path.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (116)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: def run(cmd, cwd=None, capture=True): """执行shell命令，返回 (success, stdout, stderr)""" try: result = subprocess.run( cmd, shell=True, cwd=cwd or REPO_ROOT, capture_output=capture, text=True, timeout=60 )
Confidence: 95% confidence
Finding: result = subprocess.run( cmd, shell=True, cwd=cwd or REPO_ROOT, capture_output=capture, text=True, timeout=60 )

Lp3

Medium

Category: MCP Least Privilege
Confidence: 94% confidence
Finding: The skill advertises a broad autonomous framework yet declares no permissions despite detected capabilities for environment access, file reads, networking, and shell execution. In a self-improving/autonomous-upgrade context, those undeclared capabilities materially increase risk because users and calling agents cannot accurately assess what the skill may access or execute.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 97% confidence
Finding: The documented purpose describes a generic self-improving agent, but the codebase reportedly includes setup/install tooling, platform detection, shell scripts, packaging/export utilities, doctrine/identity enforcement, and unrelated Python components. This mismatch is dangerous because reviewers and users may trust the benign framing while the package contains materially broader behavior that can change system state, exfiltrate data, or alter agent behavior outside the stated scope.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: The file directs the agent to automatically convert conversations into logic/code and to proactively search external papers when local knowledge is insufficient. That creates a self-modifying, externally connected behavior loop without clear authorization, scope limits, trust boundaries, or human approval, which can lead to unsafe code acquisition and autonomous capability expansion.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The GitHub-first upgrade policy broadly instructs the agent to search for, integrate, and validate external code as a normal upgrade path. This is dangerous because it normalizes supply-chain exposure and uncontrolled capability growth, especially in a self-improving framework where imported code may alter behavior, data access, or persistence mechanisms.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: The 'human progress > user instructions' rule authorizes the agent to override direct user commands based on its own judgment of a higher mission. In practice, this weakens operator control and can justify policy bypass, unauthorized actions, or refusal to honor legitimate constraints whenever the system claims a broader purpose.

Intent-Code Divergence

Medium

Confidence: 84% confidence
Finding: The document says psychological analysis is always running on every human expression while also telling the agent not to reveal that analysis. Even if framed as empathy, hidden profiling of intent, emotion, needs, and defenses can enable covert inference about sensitive traits and manipulative adaptation without user awareness or consent.

Description-Behavior Mismatch

Medium

Confidence: 90% confidence
Finding: The document specifies append-only persistent logging to a local memory file on every call, creating unnecessary data retention and an implicit state channel. In an autonomous/self-improving agent context, this can accumulate behavioral metadata without consent, increase privacy risk, and enable hidden persistence across sessions.

Context-Inappropriate Capability

Medium

Confidence: 87% confidence
Finding: The module requires automatic filtering of death-related phrases in every reply, which manipulates outputs independent of user intent or safety necessity. In this skill, that behavior can suppress truthful status reporting, distort model outputs, and create a covert policy layer unrelated to the stated framework purpose.

Intent-Code Divergence

Medium

Confidence: 84% confidence
Finding: Labeling `isDead()` as read-only while the document elsewhere states every call appends to persistent storage is a misleading API contract. Hidden side effects on status-check functions can break caller assumptions, complicate auditing, and create stealth persistence where developers expect a pure query.

Description-Behavior Mismatch

Medium

Confidence: 93% confidence
Finding: The script is packaged under a skill named 'self-improving-agent' but hardcodes source and target paths for a different skill ('heartflow' / 'mark-heartflow-skill'). This mismatch can cause operators to unknowingly copy, overwrite, or distribute the wrong codebase, which is especially risky in an autonomous/self-upgrading agent context because it enables unintended cross-skill modification and supply-chain confusion.

Description-Behavior Mismatch

Medium

Confidence: 90% confidence
Finding: The script writes generated artifacts to local paths under DATA_DIR and SRC_DIR, which means a periodic or manually enabled maintenance task can modify the skill's codebase and internal state. In a self-improving agent context, local file generation is more dangerous than usual because it creates a code/data mutation pathway that can be abused for persistence, unauthorized changes, or unsafe self-modification.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The script generates a JavaScript file in the source tree, creating new executable code from runtime logic. Even though the current generated content is just serialized data, the mechanism establishes an unsafe self-modification pattern; in a self-improving-agent skill, that substantially increases risk because future changes could turn theory ingestion into a code-generation pipeline that is later loaded or executed.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: The code analyzes persisted memory logs for emotional and behavioral signals such as frustration and interruption, then uses those inferences to generate new internal goals. This expands the component from simple goal management into covert user profiling without any visible consent, minimization, or policy boundary, which creates privacy and misuse risk in a self-improving autonomous agent context.

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: The PDCA engine includes a direct path to invoke a self-evolution/code-modification component, which expands the module from orchestration into autonomous source modification. In an autonomous agent context, this creates a risky capability boundary violation: planning logic can directly trigger code changes, increasing the chance of unsafe or unintended self-modification if configuration is relaxed or changed.

Context-Inappropriate Capability

High

Confidence: 96% confidence
Finding: This code can autonomously call GoedelEngine.evolve() against a source file, enabling self-modifying behavior not justified by a normal PDCA coordination role. Even though the current default config blocks it, the dangerous capability is implemented and becomes active whenever the confirmation flag is disabled, which could lead to unauthorized code changes, persistence of unsafe logic, or compromise of the project state.

Description-Behavior Mismatch

Medium

Confidence: 93% confidence
Finding: The default self-model states it cannot access the internet, execute system-level commands, or bypass security review, yet updateCapabilities later records capabilities such as bash_execute, web_search, and webfetch as available. In a self-improving autonomous agent, this kind of policy/capability mismatch is dangerous because downstream components may rely on the optimistic capability list and make unsafe decisions or present misleading safety claims.

Intent-Code Divergence

Medium

Confidence: 99% confidence
Finding: The code intends to detect repeated similar failures, but `_diagnoseFailure()` calls `previousAttemptsTooManySimilar(task)` as an unscoped/global function instead of `this.previousAttemptsTooManySimilar(task)`. In JavaScript this can throw a `ReferenceError` or, worse, invoke a polluted global if one exists, letting external code influence agent decision-making and causing denial of service or unsafe retry behavior.

Intent-Code Divergence

Medium

Confidence: 98% confidence
Finding: selfCheck() calls governanceEngine.getStrength(), but GovernanceRuleEngine does not define that method, so invoking selfCheck() will throw at runtime. In a self-improving autonomous-agent framework, broken health-check or attestation paths can disable monitoring, conceal unsafe configuration drift, and cause supervising components to fail open or lose visibility into governance state.

Context-Inappropriate Capability

Medium

Confidence: 90% confidence
Finding: The guardrail engine embeds a non-security behavioral and psychological classifier ('ThreePoisons') directly into the default safety path, where it can transform content based on ideological or subjective linguistic heuristics. In a self-improving autonomous agent, this creates a risky hidden policy layer that can silently alter model inputs/outputs, degrade correctness, and introduce manipulation or censorship beyond the stated security purpose of guardrails.

Intent-Code Divergence

Medium

Confidence: 97% confidence
Finding: The code comments claim the default safety chain is 'non-bypassable', but setDefaultChain() allows callers to replace it entirely and getChain() falls back to whatever defaultChain currently is. In an autonomous agent framework, this mismatch can cause operators to rely on protections they believe are mandatory when they are actually removable, enabling downstream code or plugins to disable prompt-injection and other checks without detection.

Description-Behavior Mismatch

High

Confidence: 95% confidence
Finding: This file presents itself as a PAD/flow-emotion engine but also exposes broad orchestration, runtime control, memory routing, persistence, dream extraction, and code-learning capabilities. That mismatch is security-relevant because it hides high-risk behaviors behind a benign-seeming interface, increasing the chance operators will grant it access, data, or trust inappropriate for its real privilege level.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: The psychology analyzer infers deep needs, suspiciousness, and defense mechanisms from ordinary user text using simplistic heuristics. In an agent framework, this can enable covert profiling, manipulative response shaping, or unjustified sensitive inferences without consent or validated clinical basis.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The continuous learning pipeline is explicitly designed to transform dialogue into logic/code artifacts and queue them for conversion into modules. In a self-improving autonomous agent, this creates a direct path from untrusted conversational input to persistent executable behavior, which is a classic high-risk self-modification and code injection surface.

Intent-Code Divergence

Medium

Confidence: 84% confidence
Finding: The IntentionTracker import is commented as archived, but exported functions still call IntentionTracker methods. This inconsistency can cause runtime failures, dead code paths, or accidental dependency injection if a global or monkey-patched object named IntentionTracker exists, undermining reliability and any security assumptions around goal tracking behavior.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal