Agent Reflect Temp

Security checks across malware telemetry and agentic risk

Overview

This skill is transparent about its goal of saving approved conversation learnings into future agent instructions, but users should review its proposed changes and stored notes carefully.

Install only if you want a skill that can change future assistant behavior. Keep auto-reflection disabled unless you intentionally need it, review every proposed diff and target file, avoid approving raw quotes that contain secrets or sensitive project details, and periodically inspect or delete ~/.reflect and .claude reflection files.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (10)

Intent-Code Divergence

Medium

Confidence: 87% confidence
Finding: The skill promises manual approval before applying changes, yet also advertises auto-reflection and permanent encoding across sessions. That ambiguity can cause the agent to persist or apply changes without a clearly renewed, informed user consent boundary, especially when the skill has write/edit capabilities and global state storage.

Vague Triggers

Medium

Confidence: 92% confidence
Finding: The trigger phrase "review session" is generic natural language that can easily appear in ordinary conversation, causing the skill to activate unintentionally. In a self-modifying skill that analyzes conversations and proposes persistent updates, accidental invocation increases the chance of unnecessary reflection runs, noisy state changes, or user confusion around when the skill is operating.

Vague Triggers

Medium

Confidence: 84% confidence
Finding: Broad triggers like `review session` and `what did I learn` can activate during normal conversation and initiate analysis of the full chat plus proposal generation. In a skill that can write files and persist state, accidental invocation materially increases the chance of unintended data retention or repository modification workflows.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The skill markets itself as self-improvement and persistence across future sessions but does not prominently disclose upfront that it may write project files, create skills, update agent definitions, store global logs, and commit changes. Users may invoke it without understanding the persistence and repository side effects, undermining informed consent.

Vague Triggers

Medium

Confidence: 93% confidence
Finding: The trigger phrase "reflect" is so generic that it can be invoked during ordinary conversation, causing this meta-skill to activate unexpectedly. In this skill's context, accidental activation is more dangerous than usual because the manifest explicitly describes persistent self-modification and cross-session learning, so an unintended trigger could initiate state-changing behavior without clear user intent.

Vague Triggers

Medium

Confidence: 88% confidence
Finding: The phrase "review session" is ambiguous and could match normal requests for summaries, retrospectives, or conversational review rather than intentional invocation of a self-improvement skill. Because this skill is designed to derive learnings and update persistent agent definitions, ambiguous triggering increases the chance of unauthorized or surprising state changes from routine dialogue.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The manifest states that the skill encodes learnings permanently into agent definitions for future sessions, but it does not present an explicit warning in the manifest itself about modifying persistent state. This is risky because users may invoke the skill without understanding that it can alter long-lived behavior, and the combination of persistent modification plus broad triggers raises the likelihood of unintended configuration drift or prompt-injection-derived self-corruption.

Ssd 3

High

Confidence: 95% confidence
Finding: The core design is to extract learnings from conversations and permanently encode them into agent definitions for future sessions. This creates a durable channel for storing user-provided content, preferences, and potentially sensitive information in long-lived files, amplifying privacy and data governance risk.

Ssd 3

High

Confidence: 97% confidence
Finding: The structured output explicitly asks for exact source quotes from the conversation, which can capture secrets, personal data, credentials, proprietary text, or sensitive instructions verbatim in reflection artifacts. Persisting raw quotes greatly increases exposure compared with storing only distilled, minimal guidance.

Ssd 3

Medium

Confidence: 90% confidence
Finding: Persistent global logs and metrics derived from analyzed sessions enable accumulation of information over time, including user behavior, preferences, and potentially sensitive metadata. Centralized retention in `~/.reflect` expands the blast radius beyond a single repository and can quietly outlive user expectations.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal