Self Improving Agent

Security checks across malware telemetry and agentic risk

Overview

This self-improvement skill is purpose-aligned, but it asks users to enable broad automatic hooks that persist raw tool activity and session-derived learnings with limited privacy controls.

Install only if you are comfortable with local hooks observing agent activity. Prefer project-level setup, avoid wildcard/global hooks for sensitive work, review hook scripts before enabling them, keep raw observations out of synced or committed folders, and periodically delete or redact logs that may contain secrets or private project data.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Rogue AgentSelf-Modification, Session Persistence
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (18)

Description-Behavior Mismatch

Medium

Confidence: 93% confidence
Finding: The privacy section claims observations stay local and that no actual code or conversation content is shared, yet the skill elsewhere documents transcript access and cross-session message passing. That inconsistency can cause users to make unsafe assumptions and permit unintended disclosure of sensitive prompts, code, or operational context across sessions.

Context-Inappropriate Capability

Medium

Confidence: 90% confidence
Finding: Reading other sessions' transcripts and sending messages across sessions expands access beyond the stated local-learning purpose and creates an unnecessary data exposure surface. In an agent environment, transcripts often contain secrets, internal code, or user instructions, so this capability increases the chance of confidentiality breaches.

Intent-Code Divergence

Medium

Confidence: 97% confidence
Finding: The document claims the hook scripts only output text and do not run commands, but the examples configure them as executable hook commands and also instruct direct execution of another script. This is misleading security guidance: users may trust the hooks as harmless text emitters when they are in fact shell scripts executed by the agent environment with the agent's permissions.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The README explicitly promotes PreToolUse/PostToolUse hooks that 'capture 100% of activity' but does not warn users that these hooks may observe prompts, commands, file paths, tool inputs/outputs, and potentially sensitive project data. In a continuous-learning skill, silent comprehensive observation increases privacy and data-retention risk because users may enable global hooks without understanding the scope of collection or where captured data is stored.

Vague Triggers

Medium

Confidence: 88% confidence
Finding: Broad natural-language triggers such as corrections, feature requests, and knowledge gaps can activate on ordinary conversation and cause unintended logging or persistence. In a memory-oriented skill, overbroad activation is dangerous because it can capture sensitive or irrelevant user content without meaningful intent or review.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The documented hook setup causes local shell commands to run automatically on every matching tool event, but the section does not prominently warn users that they are enabling arbitrary script execution. Automatic command execution in response to agent activity materially increases risk if the scripts are modified, replaced, or behave unexpectedly.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: Inter-session transcript access is described without a strong privacy warning, despite the likelihood that transcripts contain sensitive data. Users may enable or use the feature without appreciating that they are exposing prompts, code, errors, or secrets across session boundaries.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The command documentation advertises `--auto-create` as automatically generating files but does not explicitly warn that it will modify the filesystem or require clear user confirmation before doing so. In an agentic context, this increases the risk of unintended file creation or repository changes, especially if the command is invoked reflexively or as part of a larger automation flow.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The hook persistently writes full tool inputs and outputs to local JSONL files without any minimization, consent, or filtering. Tool inputs/outputs commonly contain secrets, prompts, file contents, credentials, tokens, and other sensitive project data, so this creates a privacy and security exposure if the local store is later read, synced, backed up, or accessed by other processes.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: Using an empty matcher causes the activator hook to run on every prompt, creating a very broad and persistent execution point. In a self-improving agent context, that increases the chance of prompt-driven abuse, excessive data capture, or unintended behavior across all sessions because the script is invoked indiscriminately.

Vague Triggers

Medium

Confidence: 93% confidence
Finding: The user-level configuration recommends global activation from the home directory without narrowing the trigger scope, so the hook can execute across unrelated projects and sessions. That expands blast radius substantially: a compromised or buggy script would affect all work, and any sensitive prompts across contexts may be exposed to the hook.

Vague Triggers

Medium

Confidence: 89% confidence
Finding: The Codex example also uses an empty matcher, which makes hook invocation effectively unconditional for the event. Broad unconditional hooks are risky because they create a persistent execution path for every prompt submission, increasing exposure to misuse and accidental collection or influence.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The document explicitly encourages persisting 'learnings' into workspace files and .learnings/ without any guidance to avoid secrets, personal data, credentials, or session transcripts. In an agent context, this creates a realistic risk of storing sensitive user content in durable files that may later be re-injected into prompts, exposed to other agents, committed to repositories, or read by other local users/processes.

Ssd 3

Medium

Confidence: 94% confidence
Finding: Encouraging reading and sharing other sessions' transcripts creates a direct natural-language data leakage channel. Because transcript content may include secrets, proprietary code, credentials, or personal data, this feature can exfiltrate sensitive information even without any code-level exploit.

Ssd 3

Medium

Confidence: 91% confidence
Finding: The skill instructs persistent logging of corrections, context, inputs, and error details, which can capture confidential information and store it long-term in local memory files. Persistent storage increases the blast radius of accidental disclosure because secrets and sensitive context may later be surfaced, committed, or shared.

Ssd 3

Medium

Confidence: 93% confidence
Finding: The templates explicitly request full context, parameters, and raw error output, all of which commonly contain tokens, file paths, internal URLs, stack traces, or user data. Standardizing this as the default logging format makes sensitive-data retention more likely and normalizes unsafe persistence practices.

Ssd 3

High

Confidence: 99% confidence
Finding: This hook records complete natural-language tool inputs and outputs to disk across sessions and projects, creating a durable data leakage channel. Because this skill is explicitly a continuous-learning system that captures and evolves behaviors, the stored content is especially likely to include sensitive user corrections, command outputs, repository contents, and credentials that can later be surfaced, reused, or exfiltrated by other components.

Session Persistence

Medium

Category: Rogue Agent
Content: /projects # List known projects ``` ### 3. Create Instincts Manually ```yaml ---
Confidence: 83% confidence
Finding: Create Instincts Manually ```yaml --- id: my-instinct trigger: "when to apply" confidence: 0.7 domain: "code-style" scope: project --- # Title ## Action What to do. ## Examples [...] ``` ## Direc

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal