Agent Reflect Temp

Security checks across malware telemetry and agentic risk

Overview

This skill is transparent about its goal of saving approved conversation learnings into future agent instructions, but users should review its proposed changes and stored notes carefully.

Install only if you want a skill that can change future assistant behavior. Keep auto-reflection disabled unless you intentionally need it, review every proposed diff and target file, avoid approving raw quotes that contain secrets or sensitive project details, and periodically inspect or delete ~/.reflect and .claude reflection files.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Findings (10)

Intent-Code Divergence

Medium
Confidence
87% confidence
Finding
The skill promises manual approval before applying changes, yet also advertises auto-reflection and permanent encoding across sessions. That ambiguity can cause the agent to persist or apply changes without a clearly renewed, informed user consent boundary, especially when the skill has write/edit capabilities and global state storage.

Vague Triggers

Medium
Confidence
92% confidence
Finding
The trigger phrase "review session" is generic natural language that can easily appear in ordinary conversation, causing the skill to activate unintentionally. In a self-modifying skill that analyzes conversations and proposes persistent updates, accidental invocation increases the chance of unnecessary reflection runs, noisy state changes, or user confusion around when the skill is operating.

Vague Triggers

Medium
Confidence
84% confidence
Finding
Broad triggers like `review session` and `what did I learn` can activate during normal conversation and initiate analysis of the full chat plus proposal generation. In a skill that can write files and persist state, accidental invocation materially increases the chance of unintended data retention or repository modification workflows.

Missing User Warnings

Medium
Confidence
88% confidence
Finding
The skill markets itself as self-improvement and persistence across future sessions but does not prominently disclose upfront that it may write project files, create skills, update agent definitions, store global logs, and commit changes. Users may invoke it without understanding the persistence and repository side effects, undermining informed consent.

Vague Triggers

Medium
Confidence
93% confidence
Finding
The trigger phrase "reflect" is so generic that it can be invoked during ordinary conversation, causing this meta-skill to activate unexpectedly. In this skill's context, accidental activation is more dangerous than usual because the manifest explicitly describes persistent self-modification and cross-session learning, so an unintended trigger could initiate state-changing behavior without clear user intent.

Vague Triggers

Medium
Confidence
88% confidence
Finding
The phrase "review session" is ambiguous and could match normal requests for summaries, retrospectives, or conversational review rather than intentional invocation of a self-improvement skill. Because this skill is designed to derive learnings and update persistent agent definitions, ambiguous triggering increases the chance of unauthorized or surprising state changes from routine dialogue.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
The manifest states that the skill encodes learnings permanently into agent definitions for future sessions, but it does not present an explicit warning in the manifest itself about modifying persistent state. This is risky because users may invoke the skill without understanding that it can alter long-lived behavior, and the combination of persistent modification plus broad triggers raises the likelihood of unintended configuration drift or prompt-injection-derived self-corruption.

Ssd 3

High
Confidence
95% confidence
Finding
The core design is to extract learnings from conversations and permanently encode them into agent definitions for future sessions. This creates a durable channel for storing user-provided content, preferences, and potentially sensitive information in long-lived files, amplifying privacy and data governance risk.

Ssd 3

High
Confidence
97% confidence
Finding
The structured output explicitly asks for exact source quotes from the conversation, which can capture secrets, personal data, credentials, proprietary text, or sensitive instructions verbatim in reflection artifacts. Persisting raw quotes greatly increases exposure compared with storing only distilled, minimal guidance.

Ssd 3

Medium
Confidence
90% confidence
Finding
Persistent global logs and metrics derived from analyzed sessions enable accumulation of information over time, including user behavior, preferences, and potentially sensitive metadata. Centralized retention in `~/.reflect` expands the blast radius beyond a single repository and can quietly outlive user expectations.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal