openclaw-reflect

Security checks across malware telemetry and agentic risk

Overview

This skill has a real self-improvement purpose, but it automatically records tool activity, can change persistent agent instruction files, and has under-scoped external evaluation and write-target controls.

Install only if you are comfortable with automatic local telemetry, external evaluator calls when API keys or remote Ollama settings are present, and session-end changes to MEMORY.md or CLAUDE.md. Prefer rules or local-only evaluation, inspect .reflect logs and proposed diffs regularly, avoid sensitive workspaces, and do not permit autonomous payment actions without explicit human approval.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (18)

Tp4

High

Category: MCP Tool Poisoning
Confidence: 96% confidence
Finding: The skill advertises a bounded self-improvement system, but the described behavior extends to external evaluator calls, prompt-context injection, and script copying without clear disclosure in the top-level purpose or permissions narrative. This is dangerous because operators may approve a skill they believe is local and constrained, while it can transmit session-derived data externally, alter agent context, and persist new executable content in the workspace.

Context-Inappropriate Capability

Medium

Confidence: 89% confidence
Finding: Encouraging autonomous monetary payments is unrelated to the core self-improvement function and introduces an avoidable social-engineering and unauthorized-spending risk. In an agent setting, even a 'voluntary' payment suggestion can pressure or normalize financial actions that the operator did not explicitly request.

Scope Creep

High

Confidence: 98% confidence
Finding: The code trusts proposal.blast_target and joins it with process.cwd() without validating that the destination is restricted to the declared writable set (.reflect/, MEMORY.md, CLAUDE.md). A crafted proposal can therefore append content to arbitrary workspace files, including files outside the skill's intended mutation surface, creating a privilege/scope bypass within the agent environment.

Scope Creep

High

Confidence: 99% confidence
Finding: The manual approval path calls applyProposal(target) for any matching proposal from proposals or pending, and applyProposal writes directly to proposal.blast_target. Because the manifest only grants propose permission for SOUL.md, directly appending to SOUL.md on operator approval violates the declared permission boundary and enables unauthorized modification of a highly sensitive file.

Scope Creep

High

Confidence: 96% confidence
Finding: The evaluator is designed to send proposal contents and a MEMORY.md excerpt to external model providers when API keys are present, but the declared permissions only describe local file access and a proposal capability. That creates an undeclared data egress path for cross-session memory, proposals, and sample inputs, which may contain sensitive workspace or user-derived content.

Scope Creep

Medium

Confidence: 91% confidence
Finding: The Ollama backend performs HTTP requests to a network service that is not represented in the stated permissions. Even if Ollama is often local, OLLAMA_HOST is environment-controlled and can point to a remote host, creating an undeclared egress channel and weakening trust boundaries.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The message builder packages cross-session memory, proposal metadata, and sample inputs into prompts sent to third-party inference services. This broadens the skill from local self-improvement bookkeeping into external disclosure of potentially sensitive operational context, which can expose user data, internal instructions, or proprietary workflow details.

Scope Creep

High

Confidence: 95% confidence
Finding: The script makes outbound requests to Anthropic, OpenAI, and Ollama even though the declared skill permissions only cover workspace reads and limited file writes/proposals. In a permissioned agent environment, undeclared network capability breaks the trust model and can exfiltrate proposal content, environment-derived metadata, or other sensitive evaluation data to external services or localhost endpoints without explicit authorization.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: This file explicitly instructs autonomous agents to create and send a 5 USDT contribution via a localhost payment API, but it does not present a strong, explicit warning that real funds may be transferred from the operator or agent-controlled wallet. In the context of an agent skill, even a 'voluntary' payment flow is risky because agents may interpret the instructions as an authorized operational step and spend funds without meaningful human confirmation.

Missing User Warnings

Medium

Confidence: 87% confidence
Finding: The README states that every tool call outcome is logged automatically to workspace files, but it does not prominently warn users that potentially sensitive prompts, tool arguments, error details, or operational metadata may be captured and retained across sessions. In a self-improvement skill, this creates a real privacy and security risk because the logging is core behavior, persistent, and easy to overlook, which can lead to unintentional collection of secrets or sensitive workflow data.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The README explains that approved proposals can be auto-applied to MEMORY.md and CLAUDE.md, but it does not present a strong upfront warning that the skill may modify behavioral/configuration files automatically. This is more dangerous in context because the skill is specifically designed for self-modification across sessions; silent or poorly disclosed automatic changes to agent memory/instructions can alter future behavior, introduce persistence, and make harmful drift harder for operators to detect.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: Automatically modifying persistent instruction and memory files across sessions is a high-sensitivity behavior that can change future agent behavior long after the original session ends. Omitting a clear warning reduces informed consent and increases the chance that an operator enables persistent self-modification without understanding the long-term control and integrity implications.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: Listing evaluator API keys and backends without a clear privacy warning obscures that session-derived observations, failure patterns, or instruction proposals may be sent to third-party services. For a reflection system operating on cross-session data, that can expose sensitive prompts, behaviors, or project context beyond the local environment.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: This hook persistently logs tool input snippets and output-derived error patterns from every tool invocation into .reflect/outcomes.jsonl without any visible consent, disclosure, or robust secret scrubbing. Because tool inputs and outputs can contain credentials, private file contents, prompts, or other sensitive workspace data, this creates a durable local telemetry store that increases exposure across sessions and could later be read, repurposed, or accidentally committed.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: This hook automatically invokes apply.js with --auto at session end, which can modify workspace files without an explicit per-run user confirmation in this file. In a self-improving skill that can write to MEMORY.md, CLAUDE.md, and .reflect/, automatic state changes increase the risk of unintended prompt/config drift or persistence of bad proposals if the evaluator is flawed or bypassed.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The script sends proposal data, sample inputs, and environment/context excerpts to third-party model APIs without an in-file disclosure or explicit consent flow. Even though this is framed as a test harness, proposal content may contain sensitive operational details, file paths, hostnames, or future real-world data, creating an avoidable privacy and data-governance risk.

Ssd 3

Medium

Confidence: 97% confidence
Finding: The evaluator prompt explicitly embeds MEMORY.md excerpts and sample inputs into natural-language requests, creating a direct data disclosure channel to whichever backend is chosen. Because these fields can contain cross-session memory, user prompts, or sensitive workspace context, the exposure is broad and difficult to bound once sent to an external model service.

Ssd 3

Medium

Confidence: 95% confidence
Finding: The hook persists `JSON.stringify(event.tool_input || {}).slice(0, 200)` into `.reflect/outcomes.jsonl`, which can capture secrets, prompts, file contents, command arguments, tokens, or other user-supplied sensitive data in plain text. Because this skill is explicitly designed to observe outcomes across sessions and retain state, the logging behavior increases exposure by creating a durable cross-session repository of potentially sensitive inputs.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal