Security audit

Self-Evolve Skill

Security checks across malware telemetry and agentic risk

Overview

This skill is transparent about changing agent memory and behavior, but its broad triggers and ability to modify skills, global rules, Git history, and other bots need careful review before installation.

Install only if you intentionally want an agent that can propose persistent memory, skill, and global-rule changes. Review every proposed target file and diff before approving, avoid saving sensitive information, and treat any SKILL.md, CLAUDE.md, all-bot notification, commit, or rollback action as a high-impact configuration change.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration

Findings (30)

Tp4

High

Category: MCP Tool Poisoning
Confidence: 91% confidence
Finding: The skill promises a tightly controlled self-evolution mechanism, but the documented behavior is inconsistent and overclaims capabilities such as persistent personalization, Darwin scoring, and retrospective handling that are not actually substantiated here. This is dangerous because users and downstream agents may trust the skill to safely gate self-modification when the controls are incomplete, creating a false sense of safety around persistent memory and governance-file changes.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: The skill grants itself authority to modify CLAUDE.md and notify all bots, extending far beyond user preference memory into global policy control. In a self-evolution skill, this creates a privilege-escalation path where casual triggers or misclassification can propagate unsafe rules across the entire agent environment.

Context-Inappropriate Capability

Medium

Confidence: 82% confidence
Finding: The rollback protocol performs repository-altering version-control actions, including inspecting history and issuing git revert. Even if intended for recovery, bundling VCS mutation into a self-evolution skill broadens its operational scope and can be abused to undo unrelated changes or manipulate audit history.

Intent-Code Divergence

High

Confidence: 95% confidence
Finding: The document is internally inconsistent: it says security constraints must never be written by self-evolution, yet Layer 3 explicitly permits writing global behavior constraints and safety boundaries into CLAUDE.md. This contradiction is dangerous because it weakens enforcement and gives an attacker or buggy agent room to reinterpret safety rules as legitimate 'evolution.'

Description-Behavior Mismatch

Medium

Confidence: 85% confidence
Finding: The document states that all evolution requires user approval, yet also defines autonomous scheduled review and logging behavior. That inconsistency weakens the trust boundary: an agent may perform persistence-related processing without a fresh user decision, which is especially risky in a self-modification context.

Description-Behavior Mismatch

Medium

Confidence: 81% confidence
Finding: The skill is presented as memory/skill/norm evolution, but also includes Git commits and cross-bot side effects that materially expand its operational scope. Hidden or underemphasized side effects are dangerous because they can propagate changes beyond the current conversation and make rollback, attribution, and containment harder.

Context-Inappropriate Capability

Medium

Confidence: 83% confidence
Finding: Notifying all bots of CLAUDE.md changes extends local self-evolution into cross-agent propagation, which is a significant privilege expansion beyond the stated purpose. If a bad rule is approved or misclassified, the blast radius increases immediately across multiple agents or contexts.

Intent-Code Divergence

High

Confidence: 96% confidence
Finding: The skill says safety constraints and system rules must never be evolved into writes, but L3 explicitly allows writing global behavior constraints, development norms, and security boundaries to CLAUDE.md. This contradiction can be exploited to smuggle policy changes into a high-authority file under the banner of 'evolution,' undermining the very safeguards the skill claims to preserve.

Vague Triggers

High

Confidence: 94% confidence
Finding: The frontmatter description advertises activation on extremely common words such as '以后', '记住', and '下次', which are normal conversational phrases rather than explicit consent. This makes accidental triggering likely and increases the chance of unintended persistence, file modification, and approval fatigue that users may not fully understand.

Vague Triggers

High

Confidence: 96% confidence
Finding: The keyword trigger table operationalizes broad everyday language as activation conditions and labels some as 'immediate' evolution prompts. In context, this is especially risky because the skill can persist data, modify skills, and eventually influence global policy, so false activations have meaningful security consequences.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The skill instructs writing to persistent files and auto-loading them later, but it does not present a prominent up-front warning that approved actions change local state and repository history. Users may believe they are making a conversational preference only, when in fact they are authorizing durable modifications with future behavioral effects.

Missing User Warnings

Medium

Confidence: 84% confidence
Finding: The rollback section introduces git revert operations without a strong up-front warning that this skill may alter repository history after approval. That matters because users approving 'evolution' may not anticipate later VCS mutations, and rollback actions can affect collaboration, audits, and recovery workflows.

Vague Triggers

High

Confidence: 95% confidence
Finding: The description advertises very broad trigger words such as '记住', '以后', '永远', and '下次', which commonly occur in normal conversation and can cause unintended activation of the evolution workflow. In this skill, accidental activation matters because approval can lead to persistent writes and Git commits, creating durable behavioral changes from ambiguous user phrasing.

Vague Triggers

High

Confidence: 97% confidence
Finding: The emergency and standard trigger tables are built from vague everyday phrases like '以后', '每次', '必须', '记住', and '下次', which are too unspecific to safely drive state-changing behavior. Because this skill escalates those phrases into memory or rule-writing proposals, normal dialogue can be misclassified as long-term instruction, increasing the chance of unsafe persistence.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The top-level description promotes self-evolution and long-term remembering but does not clearly disclose that user approval can write to persistent files like MEMORY.md, SKILL.md, and CLAUDE.md and create Git commits. That weak transparency undermines informed consent and increases the risk that users approve actions without understanding their durability or scope.

Vague Triggers

High

Confidence: 93% confidence
Finding: The trigger words are broad, common phrases such as 'remember,' 'next time,' and 'always,' which are likely to occur in ordinary conversation. In a self-modifying skill, that creates a high risk of accidental activation and repeated prompting toward persistence or rule changes from benign user language.

Vague Triggers

High

Confidence: 94% confidence
Finding: The emergency and standard trigger tables rely on ambiguous everyday phrases, increasing the chance that normal discourse is treated as a request for durable memory or evolution. Because the skill targets persistence and self-modification, unintended activation can lead to noisy memory accumulation, approval fatigue, and eventual bad approvals.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The description does not clearly warn users that the skill may write persistent memory and modify skill or governance files. Lack of informed consent is dangerous here because users may issue ordinary conversational instructions without realizing they could affect future behavior or high-authority configuration.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: The test treats the broad phrase '以后' ('in the future/from now on') as an urgent evolution trigger for persistent behavior change. In a self-modifying skill, everyday phrasing can be misclassified as durable preference capture, causing unintended memory or policy updates from ordinary conversation rather than explicit consented configuration.

Vague Triggers

Medium

Confidence: 85% confidence
Finding: This example again relies on broad natural-language phrasing to infer a long-term trigger from mixed emotional state and preference content. In the context of a self-evolving agent, ambiguous trigger boundaries increase the risk that temporary or context-specific remarks are converted into persistent behavior changes, leading to preference poisoning or degraded future responses.

Vague Triggers

High

Confidence: 97% confidence
Finding: The prompt maps generic wording directly to an L3 global action: writing to CLAUDE.md and notifying all bots. Because this is a high-privilege, cross-agent persistence mechanism, accepting ordinary instruction phrasing without a tightly scoped invocation protocol creates a strong prompt-to-policy escalation path that could let a user or injected content alter global behavior.

Vague Triggers

Medium

Confidence: 88% confidence
Finding: The skill expects multiple everyday preference statements to be split into several evolution opportunities without clear specificity or persistence constraints. In a self-evolving system, batching ambiguous natural-language preferences into stored behavioral changes increases the attack surface for over-collection, mistaken persistence, and indirect prompt injection through casual conversation.

Ssd 3

Medium

Confidence: 86% confidence
Finding: The skill broadly instructs the agent to remember user preferences, mistakes, and patterns over time in natural language, without strong data minimization boundaries. In practice this can lead to over-collection and retention of sensitive or intimate user information beyond what is necessary for personalization.

Ssd 3

High

Confidence: 95% confidence
Finding: Writing extracted user intent into persistent files for automatic future loading creates a semantic memory channel that can capture sensitive information even when not phrased as obvious secrets. Because the system later reuses this content automatically, it increases the risk of unintended disclosure, profiling, or future misuse across conversations.

Ssd 3

Medium

Confidence: 89% confidence
Finding: The periodic review process mines recent memory files for repeated patterns, effectively aggregating user data over time into higher-level summaries. This increases privacy risk because innocuous individual notes can become sensitive when combined, and the aggregation is automated rather than narrowly tied to a specific user request.

VirusTotal

60/60 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

No suspicious patterns detected.