Strict Self-Improving Agent (Rule of 3)

Security checks across malware telemetry and agentic risk

Overview

This appears to be a legitimate self-improvement skill, but it can persist and share sensitive session context too broadly without enough privacy controls.

Install only if you are comfortable with agents keeping durable learning notes. Prefer project-level setup, avoid global hooks, review the scripts and generated memory files, and require sanitized summaries only. Do not store secrets, customer data, raw prompts, tokens, or full command outputs, and do not promote anything into SOUL.md, AGENTS.md, TOOLS.md, Copilot instructions, or new skills without human review.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Rogue AgentSelf-Modification, Session Persistence
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (14)

Intent-Code Divergence

Medium
Confidence
83% confidence
Finding
The document first forbids direct writes to core instruction files during normal tasks, then later advises promoting aggressively 'if in doubt.' That contradiction can cause an agent or operator to bypass intended review gates and persist unvetted instructions into high-authority files, enabling prompt/instruction drift across sessions.

Intent-Code Divergence

Medium
Confidence
95% confidence
Finding
The document states that the scripts 'only output text' and 'don't modify files or run commands', but the configured hooks are explicitly executed as shell commands. That mismatch can mislead users into granting trust or permissions under false assumptions, increasing the chance they enable executable hook code without appropriate review.

Description-Behavior Mismatch

Medium
Confidence
86% confidence
Finding
The guide instructs promoting transient learnings into persistent prompt files such as SOUL.md, TOOLS.md, and AGENTS.md, which can permanently shape future agent behavior across sessions. That creates a prompt-persistence channel where mistakes, sensitive content, or attacker-influenced instructions can be amplified and reused without adequate validation.

Vague Triggers

Medium
Confidence
90% confidence
Finding
Using an empty hook matcher causes the activator to run on every prompt submission, regardless of task sensitivity or relevance. Broad automatic triggering increases the chance of unnecessary data capture, prompt-context pollution, and unintended side effects in unrelated workflows.

Vague Triggers

Medium
Confidence
90% confidence
Finding
The second example repeats the same broad empty-matcher pattern, now for both prompt submission and post-tool-use flows. This expands automatic execution surface area and can make logging/error-detection run on all Bash activity, including tasks that may involve secrets or sensitive inputs.

Vague Triggers

Medium
Confidence
88% confidence
Finding
Using an empty matcher causes the activator to run on every prompt, which broadens the trust boundary and ensures the hook executes constantly. In a self-improvement skill, this creates persistent, automatic influence over all sessions and increases exposure if the script is modified, replaced, or abused.

Vague Triggers

Medium
Confidence
92% confidence
Finding
The user-level example installs the hook globally with an empty matcher, so it will execute across all sessions rather than a single project. That persistence and broad scope make accidental overreach more dangerous, especially because the skill is designed to inject reminders into future interactions.

Vague Triggers

Medium
Confidence
87% confidence
Finding
The Codex example repeats the same empty-matcher pattern, causing universal invocation for all prompts in that environment. Broad, unconditional hook execution increases attack surface and can lead to unnecessary data exposure or persistent behavioral manipulation if the script is compromised.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
The documentation encourages logging learnings to workspace files and sharing across sessions without any privacy, sensitivity, or data-classification guidance. In a self-improvement skill, failures and corrections often contain API responses, credentials, internal paths, or user-provided confidential context, so persistence and sharing materially increase leakage risk.

Ssd 3

Medium
Confidence
93% confidence
Finding
The skill is designed to persist learnings, corrections, and context across sessions, which creates a durable store of user-provided natural-language content. Without strict minimization, retention limits, and sensitivity filtering, this can capture confidential instructions, business data, secrets, or personal information and re-surface them in future sessions.

Ssd 3

High
Confidence
95% confidence
Finding
The cross-session features explicitly encourage viewing other sessions' transcripts and forwarding learnings between sessions without clear access-control or minimization boundaries. That creates a realistic risk of lateral data exposure, where one session can inherit sensitive content from another unrelated task or user context.

Ssd 3

High
Confidence
97% confidence
Finding
The logging templates instruct storing full context, user context, inputs, parameters, related files, and raw error output. In practice, those fields are likely to contain secrets, proprietary code paths, API tokens, personal data, or sensitive operational details, making the memory files a high-value leakage target.

Ssd 3

Medium
Confidence
90% confidence
Finding
Automatic triggers for user corrections, feature requests, and newly supplied information encourage systematic retention of user-originated content, even when it is incidental or sensitive. Because this logging is trigger-based rather than necessity-based, it increases the volume and sensitivity of persisted data over time.

Session Persistence

Medium
Category
Rogue Agent
Content
### Option 1: Project-Level Configuration

Create `.claude/settings.json` in your project root:

```json
{
Confidence
84% confidence
Finding
Create `.claude/settings.json` in your project root: ```json { "hooks": { "UserPromptSubmit": [ { "matcher": "", "hooks": [ { "type": "command",

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal