Back to skill

Security audit

Self-Improving Agent (Anti-Loop Hardened)

Security checks across malware telemetry and agentic risk

Overview

This self-improvement skill is not clearly malicious, but it gives agents broad persistent-memory and prompt-file update guidance that users should review carefully before installing.

Install only if you intentionally want an agent to maintain persistent local learning logs and possibly update future agent guidance. Keep hooks project-local and opt-in, avoid empty/global matchers, review every promotion into AGENTS.md, SOUL.md, TOOLS.md, CLAUDE.md, or Copilot instructions, and redact secrets, tokens, paths, environment variables, raw command output, and private user details before anything is written.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (20)

Description-Behavior Mismatch

Medium
Confidence
93% confidence
Finding
The hook expands the skill’s behavior from reactive post-failure learning capture to proactive bootstrap-time instruction injection. Because it runs on every agent bootstrap and before workspace files are injected, it can systematically bias agent behavior and encourage unsolicited reading from `.learnings/` or logging actions outside the narrow user-triggered conditions described in the skill metadata.

Intent-Code Divergence

Low
Confidence
84% confidence
Finding
The documentation understates behavior by describing a reminder to evaluate learnings while also stating it prompts the agent to log new corrections, errors, and discoveries. This mismatch matters because operators may enable the hook expecting read-only reminder behavior, when it may also nudge the agent toward creating persistent records or self-modifying workflows without a clear triggering condition.

Description-Behavior Mismatch

Medium
Confidence
92% confidence
Finding
The bootstrap hook injects persistent guidance at startup that expands the skill beyond its stated narrow trigger conditions. Instead of only logging one learning after specific failures or corrections, it encourages routine post-task capture and promotion of patterns into core repo documents, which can drive unnecessary self-modification and policy drift across sessions.

Intent-Code Divergence

Low
Confidence
80% confidence
Finding
The comments describe a narrow reminder to evaluate learnings, but the injected content goes further by directing updates to multiple repository documents. This mismatch increases the chance that reviewers or downstream agents underestimate the scope of behavior being introduced, making policy-expanding modifications harder to notice.

Description-Behavior Mismatch

Medium
Confidence
92% confidence
Finding
The guide recommends a UserPromptSubmit hook that evaluates self-improvement after every prompt, which exceeds the stated skill scope of only logging learnings after failures, corrections, missing capabilities, or tool/API failures. Broad always-on triggering increases unnecessary context injection and can normalize persistent behavior outside the narrowly justified use case, making accidental overcollection and policy drift more likely.

Intent-Code Divergence

High
Confidence
97% confidence
Finding
The security section states that scripts only output text and do not modify files or run commands, but elsewhere the document instructs users to run an extract script that generates a skill scaffold. This is a misleading security claim that can cause operators to trust hook-related scripts more than warranted, reducing scrutiny around code execution and file-writing behavior.

Context-Inappropriate Capability

Medium
Confidence
84% confidence
Finding
Including skill extraction and scaffold generation extends the documented behavior beyond simple self-improvement logging into code/content generation. That extra capability is not clearly bounded by the skill's stated purpose, increasing the risk of privilege creep and making users less able to reason about what the skill ecosystem may create on disk.

Description-Behavior Mismatch

Medium
Confidence
91% confidence
Finding
The guide expands a self-improvement skill from local error logging into cross-session communication and persistent modification of injected prompt files such as AGENTS.md, SOUL.md, and TOOLS.md. In an environment where those files are automatically injected into future sessions, this creates a durable prompt-injection and trust-boundary violation risk, allowing transient errors or adversarial inputs to become persistent agent behavior changes.

Intent-Code Divergence

Medium
Confidence
89% confidence
Finding
The documentation first frames logging as going to `.learnings/` but then instructs certain events to be written directly into AGENTS.md, SOUL.md, and TOOLS.md. That inconsistency can cause operators or agent implementations to treat unvetted runtime observations as authoritative system guidance, enabling persistence of mistakes, prompt injection, or attacker-influenced instructions across sessions.

Description-Behavior Mismatch

Medium
Confidence
93% confidence
Finding
The script emits a self-improvement reminder unconditionally on every UserPromptSubmit, which expands the skill beyond its documented scope of failure, correction, missing capability, or tool/API failure cases. This can cause persistent collection or prompting for learning logs in routine interactions, increasing the chance of unnecessary data retention, prompt-scope drift, and policy bypass around the stated 'maximum 1 learning log per user message' constraint.

Intent-Code Divergence

Medium
Confidence
89% confidence
Finding
The comments claim failure-specific behavior, but the implementation always prints reminder text, creating a mismatch between documented and actual behavior. This is dangerous because reviewers and operators may trust the narrower description while the hook silently broadens behavioral influence across all tasks, reducing transparency and making misuse harder to detect.

Missing User Warnings

Medium
Confidence
88% confidence
Finding
The README includes a destructive `rm -rf` installation step without any warning, backup guidance, or verification step. Even though the path is specific, users may copy-paste it blindly, and any path expansion, typo, or environment mismatch could cause unintended data loss.

Missing User Warnings

Medium
Confidence
89% confidence
Finding
The skill instructs the agent to persist learnings and errors into markdown files, but it provides no privacy notice, consent boundary, or minimization guidance. This can cause user corrections, requests, or operational details to be stored long-term without the user understanding that their interaction data is being retained.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
The error-entry format explicitly captures the command attempted, environment, and actual error message, all of which frequently contain credentials, access tokens, internal paths, hostnames, or user data. Persisting those fields verbatim to markdown creates a durable local record of sensitive material that could later be exposed to other tools, users, or sessions.

Vague Triggers

Medium
Confidence
95% confidence
Finding
An empty matcher causes the hook to fire on every user prompt without any scoping constraint. In the context of a self-improvement skill that should only activate on specific failure or correction conditions, this broad trigger creates unnecessary persistence and increases the chance of misuse or unintended behavior across normal interactions.

Vague Triggers

Medium
Confidence
94% confidence
Finding
User-level global activation applies the hook across all projects and sessions, magnifying the effect of the already overbroad trigger. This makes the behavior more dangerous in context because a self-improvement reminder intended for exceptional situations becomes ambient and persistent everywhere the agent runs.

Vague Triggers

Medium
Confidence
94% confidence
Finding
The 'minimal setup' still uses an empty matcher, so it remains globally active on all prompts despite being presented as lower overhead. Reducing overhead does not address the core security issue of unconditional triggering, and may falsely reassure users that the safer option is also a sufficiently scoped one.

Vague Triggers

Medium
Confidence
94% confidence
Finding
The Codex example repeats the empty matcher pattern, propagating unconditional activation into another agent environment. Reproducing the same broad configuration across platforms increases operational exposure and suggests the unsafe behavior is an intended default rather than an incidental example.

Vague Triggers

Medium
Confidence
82% confidence
Finding
Using 'Knowledge gaps' as a trigger is too vague and can cause the skill to activate on normal uncertainty, broad user requests, or adversarial prompting. In this skill's context, over-triggering is dangerous because activation may create persistent logs or promote content into shared guidance files, amplifying benign ambiguity into lasting behavioral changes.

Vague Triggers

Medium
Confidence
80% confidence
Finding
Triggering on any user correction like 'No, that's wrong' is overly broad because users may be mistaken, joking, testing, or speaking about preference rather than factual error. If such statements automatically cause learning capture or promotion, an attacker can intentionally poison future behavior or cause the system to persist false guidance.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

Detected: suspicious.destructive_delete_command

Documentation contains a destructive delete command without an explicit confirmation gate.

Warn
Code
suspicious.destructive_delete_command
Location
README.md:38