Security audit

Openclaw Self Learning Skill

Security checks across malware telemetry and agentic risk

Overview

The skill is not clearly malicious, but it combines persistent full-context logging with an arbitrary command wrapper and under-declared execution authority, so users should review it carefully before installing.

Install only if you are comfortable with a skill that writes long-lived learning logs under your home directory and can run commands through its wrapper. Avoid logging secrets, tokens, private prompts, or sensitive file contents; review and periodically delete the learning JSON files; and do not use the command wrapper or scheduled fix-all flow unless you understand exactly what command will run.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Output HandlingUnvalidated Output Injection, Cross-Context Output, Unbounded Output
Rogue AgentSelf-Modification, Session Persistence
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger

Findings (20)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: Returns {"success": bool, "output": str, "failure_id": str or None} """ try: result = subprocess.run( command, capture_output=True, text=True,
Confidence: 95% confidence
Finding: result = subprocess.run( command, capture_output=True, text=True, timeout=60 )

Lp1

High

Category: MCP Least Privilege
Confidence: 99% confidence
Finding: The code uses subprocess execution, which is effectively a shell/process-launch capability, but the manifest declares only file.read and file.write. This permission mismatch can mislead users and policy engines about the true power of the skill, enabling broader execution than expected.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 97% confidence
Finding: This is a mismatch because the description emphasizes persistent memory and self-improvement through writing/refining memory files, but the actual implementation is primarily a failure-logging and pattern-indexing framework plus a wrapper that runs external commands. While it does persist JSON records and update them over time, it does not meaningfully implement an agent memory system that refines its own memories; instead it records failures, infers coarse pattern types from error strings, and marks related failures as fixed. The subprocess-based wrapper is a material capability not mentioned in the description, making the declared purpose incomplete and somewhat misleading.

Context-Inappropriate Capability

Medium

Confidence: 85% confidence
Finding: The scheduled cron-driven 'fix-all' automation expands the skill from passive memory/self-improvement into unattended execution that can modify state repeatedly without human review. In the context of a self-modifying, file-writing skill, automatic recurring remediation increases the chance of unsafe propagation of bad fixes or misuse of stored failure context across unrelated cases.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The skill is described as persistent memory/self-improvement, but this code acts as a general command runner. That is a materially broader capability than the stated purpose, increasing the chance the skill can be abused as an execution trampoline for unrelated or harmful actions.

Scope Creep

Critical

Confidence: 100% confidence
Finding: This is a direct violation of the declared permission model: the skill can execute external commands despite only advertising file.read and file.write. In an agent ecosystem, that creates a severe trust-boundary break because consumers may grant the skill under false assumptions while it can perform much more dangerous actions.

Description-Behavior Mismatch

Medium

Confidence: 92% confidence
Finding: The CLI behavior shows the skill is not limited to maintaining memory files; it orchestrates command execution, failure analysis, and optional auto-fixing. This scope creep increases attack surface and makes the skill more dangerous in context because users may invoke it expecting only benign self-learning file operations.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The documentation promotes automatically applying learned fixes to 'all similar cases' without warning that this can modify other tasks, files, or records based on imperfect pattern matching. In a persistent self-learning system with file.write permission, a mistaken fix can spread corruption or destructive changes across multiple pending failures at once.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The README instructs users to log failures with 'full context' but does not warn that context, stderr/stdout, and stack traces often contain secrets, personal data, file contents, or tokens. Because the skill persists this material to local memory files, it creates a durable collection point for sensitive information that may later be exposed or reused unsafely.

Vague Triggers

Medium

Confidence: 86% confidence
Finding: The trigger conditions are broad enough that the skill could activate in many normal interactions, causing persistent logging and memory updates without strong user intent boundaries. In a skill with file.read/file.write and long-term retention, vague activation criteria increase the chance of overcollection and unintended persistence of sensitive data.

Missing User Warnings

High

Confidence: 95% confidence
Finding: The skill repeatedly describes persistent capture of failures, full context, corrections, and preference shifts, but gives no warning that potentially sensitive conversation content or system state may be written to disk. This can lead users or operators to unknowingly retain secrets, personal data, internal prompts, or environment details in long-lived files.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The skill writes full failure context and stack traces to persistent disk under the user's home directory, and those fields may contain secrets, tokens, file paths, prompts, personal data, or proprietary content. In a self-learning/persistent-memory skill, this is more dangerous because the whole purpose is long-term retention, increasing the chance of unintended disclosure or secondary misuse of sensitive data.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The --auto-fix path can apply learned fixes and modify state automatically after a failure, with no interactive confirmation or preview of changes. In a self-learning system, this can propagate bad patterns, corrupt data, or apply unsafe modifications without adequate user awareness.

Ssd 3

Medium

Confidence: 96% confidence
Finding: Persistently storing full execution context, error messages, and stack traces in local files creates a sensitive data retention mechanism that can capture secrets, prompts, user inputs, internal paths, and credentials. In a self-learning skill designed to revisit and act on historical records, this increases both exposure risk and the chance that sensitive data is later surfaced or acted upon inappropriately.

Ssd 3

Medium

Confidence: 94% confidence
Finding: Persisting detailed session context and user corrections into long-term natural-language memory creates a direct data retention risk because such context often contains sensitive prompts, proprietary information, or internal reasoning artifacts. Once stored, this information can be exposed through later reads, backups, or accidental disclosure.

Ssd 3

Medium

Confidence: 95% confidence
Finding: Storing failed tasks with 'full context' is particularly risky because failure logs often include raw inputs, stack traces, system state, credentials, paths, or other confidential operational details. Long-term storage of those details expands the blast radius of any later file disclosure or misuse.

Ssd 3

Medium

Confidence: 93% confidence
Finding: Capturing user corrections and preference shifts for future behavior implies persistent storage of user-provided information, potentially including sensitive preferences, identities, or confidential instructions, without stating any filtering or consent mechanism. This creates privacy and policy risks, especially across sessions and users.

Unvalidated Output Injection

High

Category: Output Handling
Content: from self_learning import log_failure, log_success try: result = subprocess.run(['my-skill', '--arg', 'value'], capture_output=True, text=True) if result.returncode == 0: log_success("my-skill", {"args": ["--arg", "value"]}, result.stdout) else:
Confidence: 85% confidence
Finding: subprocess.run(['my-skill', '--arg', 'value'], capture_output

Unvalidated Output Injection

High

Category: Output Handling
Content: Returns {"success": bool, "output": str, "failure_id": str or None} """ try: result = subprocess.run( command, capture_output=True, text=True,
Confidence: 74% confidence
Finding: subprocess.run( command, capture_output

Session Persistence

Medium

Category: Rogue Agent
Content: 1. **Capture** — After each session or failure, log: what was attempted, what went wrong, system state 2. **Analyse** — Identify the root cause pattern, not just the symptom 3. **Generate** — Create a fix or prevention rule based on the pattern 4. **Validate** — Test the fix before committing it to memory 5. **Commit** — Update the agent's memory files only when validation succeeds
Confidence: 88% confidence
Finding: Create a fix or prevention rule based on the pattern 4. **Validate** — Test the fix before committing it to memory 5. **Commit** — Update the agent's memory files only when validation succeeds ## Fil

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal