Self Reflection

Security checks across malware telemetry and agentic risk

Overview

This self-reflection skill keeps local lesson notes to improve future responses, and the persistence is disclosed and aligned with its stated purpose.

Install only if you want the agent to keep local reflection notes across sessions. Periodically review ~/reflection/reflections.md and ~/reflection/patterns.md, avoid saving secrets or confidential details there, and enable HEARTBEAT.md integration only if you want recurring reflection checks.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Findings (7)

Description-Behavior Mismatch

Medium
Confidence
95% confidence
Finding
The setup directs the agent to create persistent state and tracking files in both shared workspace memory and a home-directory folder, which goes beyond a transient self-critique behavior. This increases privacy and integrity risk because the skill silently stores longitudinal behavioral data and modifies shared state that may influence future agent behavior outside the user’s immediate request.

Description-Behavior Mismatch

Medium
Confidence
97% confidence
Finding
The skill modifies global workspace files such as MEMORY.md and optionally HEARTBEAT.md, but the description only suggests lightweight self-critique before delivery. That mismatch is dangerous because users may enable the skill expecting ephemeral reasoning assistance, while it actually establishes persistent hooks into shared agent workflows and memory.

Context-Inappropriate Capability

Medium
Confidence
91% confidence
Finding
Persistently installing this skill into shared agent memory is not clearly necessary for its stated purpose of self-reflection and pre-delivery review. Unnecessary persistence broadens the attack surface by allowing the skill to shape future behavior, accumulate data over time, and create hidden dependencies in unrelated interactions.

Vague Triggers

Medium
Confidence
92% confidence
Finding
The pre-delivery trigger is defined very broadly as applying to 'code, architecture, strategy, any deliverable the user will act on,' which covers a large share of normal agent interactions. In practice this can cause the skill to activate far more often than users expect, increasing prompt injection surface, unnecessary processing, and unintended reads of persistent reflection files during ordinary work.

Vague Triggers

Medium
Confidence
95% confidence
Finding
The post-mistake trigger uses vague phrases like 'actually,' 'no, that's wrong,' and 'I meant,' which are common in normal conversation and do not reliably indicate that a durable reflection should be logged. This ambiguity can cause over-triggering, unnecessary persistence of interaction data, and repeated file writes based on ordinary clarification rather than genuine mistakes.

Missing User Warnings

Medium
Confidence
97% confidence
Finding
The skill stores persistent data under ~/reflection/, but the description and usage flow do not present a clear upfront disclosure or consent mechanism before that storage behavior is introduced. This is dangerous because users may unknowingly have conversation-derived corrections, patterns, and preferences written to disk, creating privacy, retention, and transparency risks.

Missing User Warnings

Low
Confidence
90% confidence
Finding
The setup instructs creation of directories and files under ~/reflection without warning the user that local filesystem changes will occur. While the actions are not inherently destructive, undisclosed writes reduce transparency and can surprise users, especially in constrained or shared environments where local persistence has security or compliance implications.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal