Self Improving Agent

Security checks across malware telemetry and agentic risk

Overview

The skill is transparent about being a self-improvement memory aid, but it can persist session details into future agent context and recommends broad always-on hooks without enough scoping or redaction guidance.

Install only if you want persistent learning behavior. Keep logs local by default, redact tokens, credentials, personal data, customer data, and raw command payloads, review any promotion into agent instruction files, and avoid global or empty-matcher hooks unless you trust the environment and want reminders on every prompt.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (14)

Intent-Code Divergence

Medium
Confidence
97% confidence
Finding
The document's security section understates behavior by claiming the scripts only output text and do not run commands, even though they are explicitly configured as command hooks and one script is documented for direct execution. This mismatch can mislead operators into granting trust or permissions under false assumptions, increasing the chance that command-executed hooks are enabled without adequate review.

Description-Behavior Mismatch

Medium
Confidence
89% confidence
Finding
The self-improvement skill is documented as using cross-session coordination and agent orchestration capabilities that go beyond simply recording corrections or failures. Expanding scope this way increases the attack surface by enabling memory sharing and operational behavior changes across sessions without clear need, which can spread poisoned instructions or sensitive context more broadly than intended.

Description-Behavior Mismatch

Medium
Confidence
92% confidence
Finding
The promotion workflow directs learned content into AGENTS.md, SOUL.md, and TOOLS.md, which are injected prompt files that can modify future agent behavior. This turns untrusted or weakly validated 'learnings' into persistent prompt influence, creating a pathway for prompt-injection persistence and long-term behavioral corruption.

Context-Inappropriate Capability

Medium
Confidence
91% confidence
Finding
Reading transcripts from other sessions is not necessary for a narrow self-improvement function and grants access to potentially sensitive historical context. If abused or triggered by poisoned instructions, it can expose unrelated session data and import untrusted content into the current session's decision-making.

Vague Triggers

Medium
Confidence
80% confidence
Finding
The activation criteria are broad enough that the skill may trigger during ordinary conversation, causing unnecessary logging, file writes, or memory review in contexts where the user did not intend persistence. In an agent environment, ambiguous auto-activation increases the chance of over-collection and surprise side effects.

Vague Triggers

Low
Confidence
80% confidence
Finding
The activation criteria are broad enough that the skill may trigger during ordinary conversation, causing unnecessary logging, file writes, or memory review in contexts where the user did not intend persistence. In an agent environment, ambiguous auto-activation increases the chance of over-collection and surprise side effects.

Vague Triggers

Medium
Confidence
90% confidence
Finding
The automatic logging triggers rely on common conversational phrases such as corrections or feature questions that occur frequently in normal dialogue. This can cause the agent to persist user statements and context without a clear security boundary, increasing the likelihood of collecting sensitive or irrelevant data.

Vague Triggers

Medium
Confidence
93% confidence
Finding
An empty matcher causes the hook to run on every prompt, creating broad and persistent automatic execution with little contextual limitation. In a self-improvement skill, that increases the blast radius of any bug, prompt-sensitive behavior, or future script change because it will activate across all interactions rather than only relevant debugging or correction scenarios.

Vague Triggers

Medium
Confidence
95% confidence
Finding
The user-level configuration installs the hook globally for all sessions, while still using a broad trigger. That creates cross-project persistence and expands exposure to unrelated repositories, contexts, and prompts, which is especially risky for a skill that records learnings and may observe sensitive workflow data over time.

Vague Triggers

Medium
Confidence
93% confidence
Finding
The Codex CLI example repeats the same empty-matcher pattern, causing unconditional activation for all prompts in that environment as well. Reproducing insecure defaults across multiple agent platforms magnifies operational risk and makes over-collection or unintended execution more likely.

Missing User Warnings

Medium
Confidence
87% confidence
Finding
The documentation encourages logging behavioral guidance and significant errors to persistent files without warning about secrets, personal data, or sensitive business context that may appear in errors and transcripts. This can create a durable local store of confidential information that may later be injected back into prompts or exposed to other tools and sessions.

Ssd 3

Medium
Confidence
94% confidence
Finding
The skill encourages storing learnings in workspace files and sharing them across sessions, but it does not include data minimization, redaction, retention, or consent safeguards. In practice, user-provided content, errors, and context may contain secrets, personal data, internal code details, or proprietary information that then becomes durable and more widely accessible.

Ssd 3

Medium
Confidence
96% confidence
Finding
The prescribed logging format explicitly asks for full context, inputs, parameters, user context, error output, and related details in markdown. Those fields are likely to capture credentials, tokens, API payloads, file paths, customer data, or other sensitive material and persist it in plain text where later agents or users can access it.

Ssd 3

Medium
Confidence
91% confidence
Finding
The guidance to 'promote aggressively' into CLAUDE.md, AGENTS.md, Copilot instructions, SOUL.md, and TOOLS.md amplifies the spread of any sensitive content that was logged earlier. Once propagated into broader agent context files, accidental disclosures become harder to detect, remove, and contain across sessions and tools.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal