Self Reflection

Security checks across malware telemetry and agentic risk

Overview

This self-reflection skill keeps local lesson notes to improve future responses, and the persistence is disclosed and aligned with its stated purpose.

Install only if you want the agent to keep local reflection notes across sessions. Periodically review ~/reflection/reflections.md and ~/reflection/patterns.md, avoid saving secrets or confidential details there, and enable HEARTBEAT.md integration only if you want recurring reflection checks.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration

Findings (7)

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The setup directs the agent to create persistent state and tracking files in both shared workspace memory and a home-directory folder, which goes beyond a transient self-critique behavior. This increases privacy and integrity risk because the skill silently stores longitudinal behavioral data and modifies shared state that may influence future agent behavior outside the user’s immediate request.

Description-Behavior Mismatch

Medium

Confidence: 97% confidence
Finding: The skill modifies global workspace files such as MEMORY.md and optionally HEARTBEAT.md, but the description only suggests lightweight self-critique before delivery. That mismatch is dangerous because users may enable the skill expecting ephemeral reasoning assistance, while it actually establishes persistent hooks into shared agent workflows and memory.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: Persistently installing this skill into shared agent memory is not clearly necessary for its stated purpose of self-reflection and pre-delivery review. Unnecessary persistence broadens the attack surface by allowing the skill to shape future behavior, accumulate data over time, and create hidden dependencies in unrelated interactions.

Vague Triggers

Medium

Confidence: 92% confidence
Finding: The pre-delivery trigger is defined very broadly as applying to 'code, architecture, strategy, any deliverable the user will act on,' which covers a large share of normal agent interactions. In practice this can cause the skill to activate far more often than users expect, increasing prompt injection surface, unnecessary processing, and unintended reads of persistent reflection files during ordinary work.

Vague Triggers

Medium

Confidence: 95% confidence
Finding: The post-mistake trigger uses vague phrases like 'actually,' 'no, that's wrong,' and 'I meant,' which are common in normal conversation and do not reliably indicate that a durable reflection should be logged. This ambiguity can cause over-triggering, unnecessary persistence of interaction data, and repeated file writes based on ordinary clarification rather than genuine mistakes.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The skill stores persistent data under ~/reflection/, but the description and usage flow do not present a clear upfront disclosure or consent mechanism before that storage behavior is introduced. This is dangerous because users may unknowingly have conversation-derived corrections, patterns, and preferences written to disk, creating privacy, retention, and transparency risks.

Missing User Warnings

Low

Confidence: 90% confidence
Finding: The setup instructs creation of directories and files under ~/reflection without warning the user that local filesystem changes will occur. While the actions are not inherently destructive, undisclosed writes reduce transparency and can surprise users, especially in constrained or shared environments where local persistence has security or compliance implications.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal