Back to skill

Security audit

Self-Improving Agent (Proactive Self-Reflection)

Security checks across malware telemetry and agentic risk

Overview

This skill is a coherent local memory tool, but it persistently changes agent behavior and stores user-derived data with several under-scoped consent and deletion behaviors users should review first.

Install only if you want the agent to maintain durable local memory and let that memory influence future work. Before using it, review the setup snippets for AGENTS.md, SOUL.md, and HEARTBEAT.md, require confirmation before any first write or deletion/export, and periodically inspect or delete ~/self-improving/. Do not store secrets, financial or medical data, location routines, third-party details, or sensitive work context in this memory.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (12)

Context-Inappropriate Capability

Medium
Confidence
95% confidence
Finding
The kill-switch instructs the system to export current memory to a file before wiping it, which creates a new persistence channel exactly when the user is requesting deletion. That undermines the privacy boundary of "forget everything" and can preserve sensitive or regulated data outside the normal memory store, increasing risk of unauthorized retention or later disclosure.

Intent-Code Divergence

Medium
Confidence
98% confidence
Finding
The phrase "forget everything" communicates complete deletion, but the documented behavior preserves memory first by exporting it. This is a direct contradiction of user intent and transparency requirements, and it can mislead users into believing data has been erased when it still exists in another artifact.

Context-Inappropriate Capability

Medium
Confidence
84% confidence
Finding
The skill prescribes different memory and confirmation behavior based on inferred user types, including 'Aggressive learning, minimal confirmation' for 'Power user'. That creates adaptive policy decisions and lightweight profiling beyond simple self-reflection, increasing the risk that the agent stores user data or changes consent thresholds without an explicit, user-approved basis.

Vague Triggers

Medium
Confidence
82% confidence
Finding
Using a broad natural-language trigger like "forget everything" can match ordinary conversational phrasing, jokes, or requests that are not intended as an irreversible memory reset. In a self-improving memory skill, accidental triggering could cause destructive state changes or initiate unexpected export/deletion workflows.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The workflow describes destructive deletion and hidden data export without clearly warning the user that a file will be created first. This creates a transparency and consent failure: users may request forgetting and instead cause additional data handling they did not authorize, potentially expanding exposure surface.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
The skill explicitly describes logging user corrections, repeated preferences, and project-scoped behavior, but it does not disclose retention, storage location, access controls, or user consent. In a self-improving agent, this creates a privacy and data-governance risk because users may unknowingly provide behavioral data that is persisted and reused across sessions or contexts.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The reversal flow says prior preferences are archived, history is kept, and timestamps are logged, which confirms ongoing retention of user-derived preference history even after changes. Without transparency, deletion options, or retention boundaries, this can preserve sensitive behavioral data longer than users expect and enable profiling over time.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The skill instructs the agent to create persistent directories and files under the user's home directory on first activation without any explicit consent, warning, or explanation of the filesystem side effects. Persistent self-modifying or memory-related storage increases risk because it leaves durable state on the system, may capture sensitive user/task data over time, and can operate outside the user's awareness.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The file explicitly states that memory will be loaded on every session and updated automatically based on usage patterns, but it provides no warning, consent mechanism, or retention/privacy safeguards. In an agent skill, this can lead to silent persistence of user-derived preferences, project details, or other sensitive context across sessions, increasing the risk of unintended data retention and privacy violations.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
The skill defines automatic loading, writing, archiving, and deletion of memory across multiple files and even a weekly maintenance cron, but it does not require explicit user consent, confirmation boundaries, or clear disclosure of persistence behavior. In an agent context, this can cause silent retention, mutation, or destruction of user data, including broad actions like "Forget everything," which increases privacy and integrity risk.

Natural-Language Policy Violations

Low
Confidence
77% confidence
Finding
The example memory contents include persistent communication preferences like "direct, no hedging" and behavioral shortcuts such as interpreting "looks good" as approval to proceed, without showing opt-in or validation. In a self-improving agent, these examples can normalize storing and reapplying inferred preferences that may affect future decisions or user consent boundaries.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
The setup explicitly instructs the agent to write persistent entries to files under `~/self-improving/` immediately and even before the final response, but it does not require explicit user consent, review, or disclosure that state will be modified. This creates a persistent-state side effect that can silently store sensitive user data, entrench incorrect self-inferred rules, or alter future behavior across tasks in ways the user did not request or may not notice.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

No suspicious patterns detected.