Self Improving Agent

Security checks across malware telemetry and agentic risk

Overview

The skill is transparent about being a self-improvement memory aid, but it can persist session details into future agent context and recommends broad always-on hooks without enough scoping or redaction guidance.

Install only if you want persistent learning behavior. Keep logs local by default, redact tokens, credentials, personal data, customer data, and raw command payloads, review any promotion into agent instruction files, and avoid global or empty-matcher hooks unless you trust the environment and want reminders on every prompt.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (14)

Intent-Code Divergence

Medium

Confidence: 97% confidence
Finding: The document's security section understates behavior by claiming the scripts only output text and do not run commands, even though they are explicitly configured as command hooks and one script is documented for direct execution. This mismatch can mislead operators into granting trust or permissions under false assumptions, increasing the chance that command-executed hooks are enabled without adequate review.

Description-Behavior Mismatch

Medium

Confidence: 89% confidence
Finding: The self-improvement skill is documented as using cross-session coordination and agent orchestration capabilities that go beyond simply recording corrections or failures. Expanding scope this way increases the attack surface by enabling memory sharing and operational behavior changes across sessions without clear need, which can spread poisoned instructions or sensitive context more broadly than intended.

Description-Behavior Mismatch

Medium

Confidence: 92% confidence
Finding: The promotion workflow directs learned content into AGENTS.md, SOUL.md, and TOOLS.md, which are injected prompt files that can modify future agent behavior. This turns untrusted or weakly validated 'learnings' into persistent prompt influence, creating a pathway for prompt-injection persistence and long-term behavioral corruption.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: Reading transcripts from other sessions is not necessary for a narrow self-improvement function and grants access to potentially sensitive historical context. If abused or triggered by poisoned instructions, it can expose unrelated session data and import untrusted content into the current session's decision-making.

Vague Triggers

Medium

Confidence: 80% confidence
Finding: The activation criteria are broad enough that the skill may trigger during ordinary conversation, causing unnecessary logging, file writes, or memory review in contexts where the user did not intend persistence. In an agent environment, ambiguous auto-activation increases the chance of over-collection and surprise side effects.

Vague Triggers

Low

Confidence: 80% confidence
Finding: The activation criteria are broad enough that the skill may trigger during ordinary conversation, causing unnecessary logging, file writes, or memory review in contexts where the user did not intend persistence. In an agent environment, ambiguous auto-activation increases the chance of over-collection and surprise side effects.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: The automatic logging triggers rely on common conversational phrases such as corrections or feature questions that occur frequently in normal dialogue. This can cause the agent to persist user statements and context without a clear security boundary, increasing the likelihood of collecting sensitive or irrelevant data.

Vague Triggers

Medium

Confidence: 93% confidence
Finding: An empty matcher causes the hook to run on every prompt, creating broad and persistent automatic execution with little contextual limitation. In a self-improvement skill, that increases the blast radius of any bug, prompt-sensitive behavior, or future script change because it will activate across all interactions rather than only relevant debugging or correction scenarios.

Vague Triggers

Medium

Confidence: 95% confidence
Finding: The user-level configuration installs the hook globally for all sessions, while still using a broad trigger. That creates cross-project persistence and expands exposure to unrelated repositories, contexts, and prompts, which is especially risky for a skill that records learnings and may observe sensitive workflow data over time.

Vague Triggers

Medium

Confidence: 93% confidence
Finding: The Codex CLI example repeats the same empty-matcher pattern, causing unconditional activation for all prompts in that environment as well. Reproducing insecure defaults across multiple agent platforms magnifies operational risk and makes over-collection or unintended execution more likely.

Missing User Warnings

Medium

Confidence: 87% confidence
Finding: The documentation encourages logging behavioral guidance and significant errors to persistent files without warning about secrets, personal data, or sensitive business context that may appear in errors and transcripts. This can create a durable local store of confidential information that may later be injected back into prompts or exposed to other tools and sessions.

Ssd 3

Medium

Confidence: 94% confidence
Finding: The skill encourages storing learnings in workspace files and sharing them across sessions, but it does not include data minimization, redaction, retention, or consent safeguards. In practice, user-provided content, errors, and context may contain secrets, personal data, internal code details, or proprietary information that then becomes durable and more widely accessible.

Ssd 3

Medium

Confidence: 96% confidence
Finding: The prescribed logging format explicitly asks for full context, inputs, parameters, user context, error output, and related details in markdown. Those fields are likely to capture credentials, tokens, API payloads, file paths, customer data, or other sensitive material and persist it in plain text where later agents or users can access it.

Ssd 3

Medium

Confidence: 91% confidence
Finding: The guidance to 'promote aggressively' into CLAUDE.md, AGENTS.md, Copilot instructions, SOUL.md, and TOOLS.md amplifies the spread of any sensitive content that was logged earlier. Once propagated into broader agent context files, accidental disclosures become harder to detect, remove, and contain across sessions and tools.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal