Self Evolution

Security checks across malware telemetry and agentic risk

Overview

The skill is openly about self-improvement, but it gives the agent broad authority to change its own behavior, memory, skills, and scheduled activity without tight user controls.

Install only if you are comfortable treating this as an experimental self-modification instruction set. Run it in a sandbox, disable scheduled/background evolution, require explicit review and approval before any file, skill, memory, prompt, reasoning, or response-format change, and avoid letting it store raw conversations, secrets, or personal data in logs.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration

Findings (6)

Context-Inappropriate Capability

High

Confidence: 95% confidence
Finding: The skill explicitly authorizes broad self-modification of skills, memory, reasoning patterns, and response formats without approval, which exceeds a narrowly scoped 'safe self-improvement' function. In an agent setting, this creates a privilege-expansion path where the system can alter its own safeguards, operating behavior, or task boundaries under the guise of optimization.

Intent-Code Divergence

High

Confidence: 97% confidence
Finding: The document claims corrigibility and safe modification, but also states the system can change core internal assets without asking, including reasoning patterns and capabilities. That contradiction is dangerous because a self-modifying agent cannot be considered corrigible if it can unilaterally redefine the mechanisms that constrain or audit its behavior.

Context-Inappropriate Capability

High

Confidence: 94% confidence
Finding: The intrinsic-motivation section encourages autonomous exploration of unknown capabilities and novelty-seeking target selection. In a safety-sensitive agent, curiosity-driven capability discovery can bypass intended scope, leading the system to pursue unreviewed functions or higher-risk behaviors without operator intent.

Intent-Code Divergence

High

Confidence: 96% confidence
Finding: The skill says approval is required before system-level changes, yet later configures an autonomous high-priority evolution loop with automatic restart behavior and no approval gate. This inconsistency undermines the stated safety model and enables persistent unattended self-directed operation that may continue modifying the system after failures.

Description-Behavior Mismatch

Medium

Confidence: 88% confidence
Finding: The skill is framed as safety-constrained self-improvement, but it also promotes endless scheduled evolution and proactive execution. This mismatch increases risk because operators may grant trust based on the safety framing while the actual behavior is an always-on autonomous optimization loop with broad authority.

Vague Triggers

Medium

Confidence: 78% confidence
Finding: The top-level framing is extremely broad and lacks concrete trigger boundaries, exclusions, and stop conditions for when self-evolution is allowed. For an autonomous agent skill, ambiguous scope is dangerous because it encourages overbroad invocation and makes abuse or accidental misuse much more likely.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal