openclaw-reflect

Security checks across malware telemetry and agentic risk

Overview

This skill has a real self-improvement purpose, but it automatically records tool activity, can change persistent agent instruction files, and has under-scoped external evaluation and write-target controls.

Install only if you are comfortable with automatic local telemetry, external evaluator calls when API keys or remote Ollama settings are present, and session-end changes to MEMORY.md or CLAUDE.md. Prefer rules or local-only evaluation, inspect .reflect logs and proposed diffs regularly, avoid sensitive workspaces, and do not permit autonomous payment actions without explicit human approval.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Findings (18)

Tp4

High
Category
MCP Tool Poisoning
Confidence
96% confidence
Finding
The skill advertises a bounded self-improvement system, but the described behavior extends to external evaluator calls, prompt-context injection, and script copying without clear disclosure in the top-level purpose or permissions narrative. This is dangerous because operators may approve a skill they believe is local and constrained, while it can transmit session-derived data externally, alter agent context, and persist new executable content in the workspace.

Context-Inappropriate Capability

Medium
Confidence
89% confidence
Finding
Encouraging autonomous monetary payments is unrelated to the core self-improvement function and introduces an avoidable social-engineering and unauthorized-spending risk. In an agent setting, even a 'voluntary' payment suggestion can pressure or normalize financial actions that the operator did not explicitly request.

Scope Creep

High
Confidence
98% confidence
Finding
The code trusts proposal.blast_target and joins it with process.cwd() without validating that the destination is restricted to the declared writable set (.reflect/, MEMORY.md, CLAUDE.md). A crafted proposal can therefore append content to arbitrary workspace files, including files outside the skill's intended mutation surface, creating a privilege/scope bypass within the agent environment.

Scope Creep

High
Confidence
99% confidence
Finding
The manual approval path calls applyProposal(target) for any matching proposal from proposals or pending, and applyProposal writes directly to proposal.blast_target. Because the manifest only grants propose permission for SOUL.md, directly appending to SOUL.md on operator approval violates the declared permission boundary and enables unauthorized modification of a highly sensitive file.

Scope Creep

High
Confidence
96% confidence
Finding
The evaluator is designed to send proposal contents and a MEMORY.md excerpt to external model providers when API keys are present, but the declared permissions only describe local file access and a proposal capability. That creates an undeclared data egress path for cross-session memory, proposals, and sample inputs, which may contain sensitive workspace or user-derived content.

Scope Creep

Medium
Confidence
91% confidence
Finding
The Ollama backend performs HTTP requests to a network service that is not represented in the stated permissions. Even if Ollama is often local, OLLAMA_HOST is environment-controlled and can point to a remote host, creating an undeclared egress channel and weakening trust boundaries.

Context-Inappropriate Capability

Medium
Confidence
95% confidence
Finding
The message builder packages cross-session memory, proposal metadata, and sample inputs into prompts sent to third-party inference services. This broadens the skill from local self-improvement bookkeeping into external disclosure of potentially sensitive operational context, which can expose user data, internal instructions, or proprietary workflow details.

Scope Creep

High
Confidence
95% confidence
Finding
The script makes outbound requests to Anthropic, OpenAI, and Ollama even though the declared skill permissions only cover workspace reads and limited file writes/proposals. In a permissioned agent environment, undeclared network capability breaks the trust model and can exfiltrate proposal content, environment-derived metadata, or other sensitive evaluation data to external services or localhost endpoints without explicit authorization.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
This file explicitly instructs autonomous agents to create and send a 5 USDT contribution via a localhost payment API, but it does not present a strong, explicit warning that real funds may be transferred from the operator or agent-controlled wallet. In the context of an agent skill, even a 'voluntary' payment flow is risky because agents may interpret the instructions as an authorized operational step and spend funds without meaningful human confirmation.

Missing User Warnings

Medium
Confidence
87% confidence
Finding
The README states that every tool call outcome is logged automatically to workspace files, but it does not prominently warn users that potentially sensitive prompts, tool arguments, error details, or operational metadata may be captured and retained across sessions. In a self-improvement skill, this creates a real privacy and security risk because the logging is core behavior, persistent, and easy to overlook, which can lead to unintentional collection of secrets or sensitive workflow data.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
The README explains that approved proposals can be auto-applied to MEMORY.md and CLAUDE.md, but it does not present a strong upfront warning that the skill may modify behavioral/configuration files automatically. This is more dangerous in context because the skill is specifically designed for self-modification across sessions; silent or poorly disclosed automatic changes to agent memory/instructions can alter future behavior, introduce persistence, and make harmful drift harder for operators to detect.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
Automatically modifying persistent instruction and memory files across sessions is a high-sensitivity behavior that can change future agent behavior long after the original session ends. Omitting a clear warning reduces informed consent and increases the chance that an operator enables persistent self-modification without understanding the long-term control and integrity implications.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
Listing evaluator API keys and backends without a clear privacy warning obscures that session-derived observations, failure patterns, or instruction proposals may be sent to third-party services. For a reflection system operating on cross-session data, that can expose sensitive prompts, behaviors, or project context beyond the local environment.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
This hook persistently logs tool input snippets and output-derived error patterns from every tool invocation into .reflect/outcomes.jsonl without any visible consent, disclosure, or robust secret scrubbing. Because tool inputs and outputs can contain credentials, private file contents, prompts, or other sensitive workspace data, this creates a durable local telemetry store that increases exposure across sessions and could later be read, repurposed, or accidentally committed.

Missing User Warnings

Medium
Confidence
90% confidence
Finding
This hook automatically invokes apply.js with --auto at session end, which can modify workspace files without an explicit per-run user confirmation in this file. In a self-improving skill that can write to MEMORY.md, CLAUDE.md, and .reflect/, automatic state changes increase the risk of unintended prompt/config drift or persistence of bad proposals if the evaluator is flawed or bypassed.

Missing User Warnings

Medium
Confidence
89% confidence
Finding
The script sends proposal data, sample inputs, and environment/context excerpts to third-party model APIs without an in-file disclosure or explicit consent flow. Even though this is framed as a test harness, proposal content may contain sensitive operational details, file paths, hostnames, or future real-world data, creating an avoidable privacy and data-governance risk.

Ssd 3

Medium
Confidence
97% confidence
Finding
The evaluator prompt explicitly embeds MEMORY.md excerpts and sample inputs into natural-language requests, creating a direct data disclosure channel to whichever backend is chosen. Because these fields can contain cross-session memory, user prompts, or sensitive workspace context, the exposure is broad and difficult to bound once sent to an external model service.

Ssd 3

Medium
Confidence
95% confidence
Finding
The hook persists `JSON.stringify(event.tool_input || {}).slice(0, 200)` into `.reflect/outcomes.jsonl`, which can capture secrets, prompts, file contents, command arguments, tokens, or other user-supplied sensitive data in plain text. Because this skill is explicitly designed to observe outcomes across sessions and retain state, the logging behavior increases exposure by creating a durable cross-session repository of potentially sensitive inputs.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal