Self-Improving with Reflection

Security checks across malware telemetry and agentic risk

Overview

This is a local memory skill that stores corrections and preferences as disclosed, with privacy-relevant persistence but no evidence of hidden exfiltration or destructive behavior.

Install this only if you want a local long-term learning memory. Use Strict or Passive mode for more control, periodically review ./self-improving/, avoid storing secrets or sensitive personal data, and inspect any proposed AGENTS.md, SOUL.md, or HEARTBEAT.md changes before applying them.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (12)

Description-Behavior Mismatch

Medium

Confidence: 94% confidence
Finding: The skill’s advertised purpose is self-reflection and self-improvement, but the implementation is a broad persistent memory subsystem with cross-session storage, search, export, deletion, tiering, and project scoping. This mismatch expands data collection and retention beyond user expectations, increasing the risk of unintended storage of sensitive information and unauthorized secondary use.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: A capability to export all stored memory as a downloadable archive materially increases the consequences of any misuse, prompt abuse, or unauthorized invocation because it concentrates potentially sensitive historical data into a portable artifact. For a self-reflection skill, bulk export is not obviously necessary and broadens exfiltration risk.

Vague Triggers

Medium

Confidence: 87% confidence
Finding: The correction triggers are broad natural-language phrases that commonly appear in ordinary conversation, so the skill may incorrectly treat routine dialogue as durable feedback and write it to memory. In a self-improving memory skill, that increases the chance of unintended persistence of transient preferences, mistaken corrections, or manipulative prompt content, which can alter future behavior without clear user consent.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The README presents the skill as improving over time by learning from corrections and organizing memory, but it does not clearly warn users that their corrections and preferences may be stored in local files and retained across sessions. This creates a transparency and consent gap: users may disclose sensitive or personal information during normal interaction without realizing it could be persisted.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The skill instructs the agent to automatically persist user corrections, preferences, and learned patterns to local storage, but it does not require clear notice or consent before creating durable memory. This can cause unintended retention of sensitive personal preferences or behavioral data across sessions, especially because logging is triggered by ordinary conversational phrases and self-reflection rather than an explicit opt-in flow.

Vague Triggers

Medium

Confidence: 91% confidence
Finding: The kill switch relies on a very broad natural-language trigger phrase, "forget everything," which could plausibly appear in ordinary conversation, quoted text, or discussion about policy rather than an actual deletion command. In a memory-oriented self-improving skill, accidental activation could erase learned state unexpectedly and disrupt integrity, auditability, and user control.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The skill explicitly instructs the agent to log corrections, promote them into persistent preferences, archive prior patterns, and preserve inactive items for 90+ days, but it provides no user-facing notice, consent flow, retention policy, or data-handling safeguards. This creates a real privacy and governance risk because user behavior and preferences may be stored across sessions or projects without clear authorization, visibility, deletion controls, or scoping guarantees.

Missing User Warnings

Low

Confidence: 93% confidence
Finding: The file explicitly instructs first-time activation to create local directories and files, but it does not warn the user that activating the skill will modify the local workspace. In an agent-skill context, undisclosed filesystem side effects reduce user awareness and consent, and can become more concerning when combined with autonomous execution or repeated activation.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The skill describes automatically loading memory on session start and automatically writing corrections to persistent files without a user-facing warning or consent mechanism. Hidden or poorly disclosed persistent reads and writes are dangerous because they can capture sensitive user data across sessions and modify long-term state outside the user’s immediate awareness.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: Automated archival, movement between tiers, and compaction alter the lifecycle and location of stored user data without a clear warning or policy disclosure. Even if intended for maintenance, these background changes can undermine user expectations about retention, discoverability, and deletion semantics, especially for sensitive or project-scoped information.

Missing User Warnings

Medium

Confidence: 87% confidence
Finding: The guidance explicitly recommends 'aggressive learning' and 'minimal confirmation' for power users, which increases the chance that the agent stores preferences, corrections, or behavioral data without clear user awareness or consent. In a self-improving memory skill, this is more dangerous than generic UX advice because it directly affects persistent memory and can lead to over-collection, inaccurate long-term profiling, or retention of sensitive user data.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The skill explicitly instructs the agent to append to local memory files before the final response, creating side effects on the filesystem without an explicit user request or clear user-facing disclosure. In an adversarial or sensitive context, this can cause unauthorized persistence of potentially sensitive prompts, corrections, or inferred rules, and may alter future agent behavior in ways the user did not approve.

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal