Security audit

Self Improving Agent 3.0.2

Security checks across malware telemetry and agentic risk

Overview

This skill is not malicious, but it deserves review because it encourages persistent agent memory, broad hook activation, and cross-session sharing without enough privacy and consent boundaries.

Install only if you intentionally want an agent-memory workflow. Keep hooks project-local where possible, avoid global every-prompt activation, do not store secrets, credentials, raw transcripts, customer data, or sensitive command output, and review any learning before promoting it into agent instruction files or sending it to another session.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (15)

Intent-Code Divergence

Medium

Confidence: 98% confidence
Finding: The document’s security section states the scripts 'only output text' and 'don’t modify files or run commands,' but the entire setup configures those scripts to be executed as hook commands. This misleading assurance can cause users to under-trust the risk of arbitrary shell-script execution in response to prompts or tool events, increasing the chance they enable unsafe automation without proper review.

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: The document broadens a narrowly scoped self-improvement skill into management of injected workspace prompts, behavioral guidance, and multi-agent coordination. That scope expansion increases the chance the skill becomes a vehicle for persistent prompt manipulation and unintended control over agent behavior beyond simple error logging.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: Reading other sessions and sending messages across sessions is not necessary for basic self-improvement logging and creates a lateral data-flow channel. In a prompt-injection-prone environment, this can spread poisoned instructions, leak sensitive context from one session to another, or persist unsafe learnings across agents.

Vague Triggers

Medium

Confidence: 85% confidence
Finding: The guidance says to use the skill in many common situations, making activation likely during routine conversation rather than only exceptional learning events. Over-broad activation increases the chance of unnecessary logging, persistence of sensitive user content, and prompt pollution from constant self-monitoring.

Vague Triggers

Medium

Confidence: 92% confidence
Finding: The listed trigger phrases like corrections and feature requests are generic and commonly appear in normal dialogue. Treating these as automatic logging signals can capture large amounts of routine conversational content into persistent files and increase false-positive activation of the skill.

Vague Triggers

High

Confidence: 96% confidence
Finding: The hook examples use an empty matcher, effectively causing universal activation on every user prompt and selected tool events. This greatly expands the skill's scope, creates persistent monitoring behavior, and can lead to systematic collection of sensitive context or repeated prompt injection overhead.

Vague Triggers

Medium

Confidence: 93% confidence
Finding: An empty matcher causes the UserPromptSubmit hook to run on every prompt, creating unconditional execution of the configured shell script. In a self-improvement skill, this broad trigger increases exposure to prompt-injection-driven behavior, unnecessary processing of sensitive prompts, and persistent execution across routine workflows.

Vague Triggers

Medium

Confidence: 92% confidence
Finding: The user-level configuration enables the hook globally, so the script will execute across all repositories and sessions without contextual restrictions. That broadens the blast radius from a single project to the user’s whole environment, making accidental data exposure, overcollection of prompts, or execution in untrusted contexts more likely.

Vague Triggers

Low

Confidence: 87% confidence
Finding: Although labeled 'minimal,' this setup still uses an empty matcher, so the activator runs for every submitted prompt. The reduced number of hooks lowers overhead, but the unconstrained trigger still creates unnecessary execution and can surface reminders or logic in contexts where they are not appropriate.

Vague Triggers

Medium

Confidence: 91% confidence
Finding: The Codex example repeats the same empty-matcher pattern, causing unconditional hook execution in another agent environment. Replicating broad defaults across platforms increases the likelihood users adopt insecure configurations at scale and normalizes always-on command hooks without adequate boundaries.

Vague Triggers

Medium

Confidence: 82% confidence
Finding: Triggering on generic user corrections such as 'No, that's wrong' is overly broad and can activate persistence logic during normal conversation. That can cause accidental storage of untrusted or adversarial content as a 'learning,' turning benign dialogue into a prompt-persistence mechanism.

Vague Triggers

Medium

Confidence: 85% confidence
Finding: The trigger 'Knowledge gaps' is ambiguous and gives the agent broad discretion to decide when to log or promote content. Ambiguous self-activation criteria are dangerous in systems with memory or prompt injection because they can be abused to persist speculative, incorrect, or attacker-supplied guidance.

Ssd 3

Medium

Confidence: 93% confidence
Finding: The skill directs agents to persist user corrections, requests, and other interaction-derived content into `.learnings/` and to promote some of it into longer-lived memory files. This creates a semantic data-leak risk because sensitive information from conversations may be retained and reused outside the original context.

Ssd 3

High

Confidence: 97% confidence
Finding: The cross-session features explicitly encourage reading other sessions' transcripts and sending learnings between sessions as routine workflow. That makes natural-language exfiltration of sensitive user data more likely, especially when agents summarize or forward transcript content into other contexts without a strict need-to-know boundary.

Ssd 3

Medium

Confidence: 95% confidence
Finding: The logging templates ask for full context, actual error output, inputs, parameters, and environment details, all of which commonly contain secrets, tokens, personal data, internal paths, or proprietary business information. Storing this verbatim in persistent markdown files increases the chance of accidental disclosure and later reuse in broader contexts.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal