Back to skill

Security audit

Openclaw Self Learning Skill

Security checks across malware telemetry and agentic risk

Overview

The skill is not clearly malicious, but it combines persistent full-context logging with an arbitrary command wrapper and under-declared execution authority, so users should review it carefully before installing.

Install only if you are comfortable with a skill that writes long-lived learning logs under your home directory and can run commands through its wrapper. Avoid logging secrets, tokens, private prompts, or sensitive file contents; review and periodically delete the learning JSON files; and do not use the command wrapper or scheduled fix-all flow unless you understand exactly what command will run.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Output HandlingUnvalidated Output Injection, Cross-Context Output, Unbounded Output
  • Rogue AgentSelf-Modification, Session Persistence
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Findings (20)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
Returns {"success": bool, "output": str, "failure_id": str or None}
    """
    try:
        result = subprocess.run(
            command,
            capture_output=True,
            text=True,
Confidence
95% confidence
Finding
result = subprocess.run( command, capture_output=True, text=True, timeout=60 )

Lp1

High
Category
MCP Least Privilege
Confidence
99% confidence
Finding
The code uses subprocess execution, which is effectively a shell/process-launch capability, but the manifest declares only file.read and file.write. This permission mismatch can mislead users and policy engines about the true power of the skill, enabling broader execution than expected.

Tp4

High
Category
MCP Tool Poisoning
Confidence
97% confidence
Finding
This is a mismatch because the description emphasizes persistent memory and self-improvement through writing/refining memory files, but the actual implementation is primarily a failure-logging and pattern-indexing framework plus a wrapper that runs external commands. While it does persist JSON records and update them over time, it does not meaningfully implement an agent memory system that refines its own memories; instead it records failures, infers coarse pattern types from error strings, and marks related failures as fixed. The subprocess-based wrapper is a material capability not mentioned in the description, making the declared purpose incomplete and somewhat misleading.

Context-Inappropriate Capability

Medium
Confidence
85% confidence
Finding
The scheduled cron-driven 'fix-all' automation expands the skill from passive memory/self-improvement into unattended execution that can modify state repeatedly without human review. In the context of a self-modifying, file-writing skill, automatic recurring remediation increases the chance of unsafe propagation of bad fixes or misuse of stored failure context across unrelated cases.

Context-Inappropriate Capability

High
Confidence
98% confidence
Finding
The skill is described as persistent memory/self-improvement, but this code acts as a general command runner. That is a materially broader capability than the stated purpose, increasing the chance the skill can be abused as an execution trampoline for unrelated or harmful actions.

Scope Creep

Critical
Confidence
100% confidence
Finding
This is a direct violation of the declared permission model: the skill can execute external commands despite only advertising file.read and file.write. In an agent ecosystem, that creates a severe trust-boundary break because consumers may grant the skill under false assumptions while it can perform much more dangerous actions.

Description-Behavior Mismatch

Medium
Confidence
92% confidence
Finding
The CLI behavior shows the skill is not limited to maintaining memory files; it orchestrates command execution, failure analysis, and optional auto-fixing. This scope creep increases attack surface and makes the skill more dangerous in context because users may invoke it expecting only benign self-learning file operations.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The documentation promotes automatically applying learned fixes to 'all similar cases' without warning that this can modify other tasks, files, or records based on imperfect pattern matching. In a persistent self-learning system with file.write permission, a mistaken fix can spread corruption or destructive changes across multiple pending failures at once.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The README instructs users to log failures with 'full context' but does not warn that context, stderr/stdout, and stack traces often contain secrets, personal data, file contents, or tokens. Because the skill persists this material to local memory files, it creates a durable collection point for sensitive information that may later be exposed or reused unsafely.

Vague Triggers

Medium
Confidence
86% confidence
Finding
The trigger conditions are broad enough that the skill could activate in many normal interactions, causing persistent logging and memory updates without strong user intent boundaries. In a skill with file.read/file.write and long-term retention, vague activation criteria increase the chance of overcollection and unintended persistence of sensitive data.

Missing User Warnings

High
Confidence
95% confidence
Finding
The skill repeatedly describes persistent capture of failures, full context, corrections, and preference shifts, but gives no warning that potentially sensitive conversation content or system state may be written to disk. This can lead users or operators to unknowingly retain secrets, personal data, internal prompts, or environment details in long-lived files.

Missing User Warnings

Medium
Confidence
92% confidence
Finding
The skill writes full failure context and stack traces to persistent disk under the user's home directory, and those fields may contain secrets, tokens, file paths, prompts, personal data, or proprietary content. In a self-learning/persistent-memory skill, this is more dangerous because the whole purpose is long-term retention, increasing the chance of unintended disclosure or secondary misuse of sensitive data.

Missing User Warnings

Medium
Confidence
90% confidence
Finding
The --auto-fix path can apply learned fixes and modify state automatically after a failure, with no interactive confirmation or preview of changes. In a self-learning system, this can propagate bad patterns, corrupt data, or apply unsafe modifications without adequate user awareness.

Ssd 3

Medium
Confidence
96% confidence
Finding
Persistently storing full execution context, error messages, and stack traces in local files creates a sensitive data retention mechanism that can capture secrets, prompts, user inputs, internal paths, and credentials. In a self-learning skill designed to revisit and act on historical records, this increases both exposure risk and the chance that sensitive data is later surfaced or acted upon inappropriately.

Ssd 3

Medium
Confidence
94% confidence
Finding
Persisting detailed session context and user corrections into long-term natural-language memory creates a direct data retention risk because such context often contains sensitive prompts, proprietary information, or internal reasoning artifacts. Once stored, this information can be exposed through later reads, backups, or accidental disclosure.

Ssd 3

Medium
Confidence
95% confidence
Finding
Storing failed tasks with 'full context' is particularly risky because failure logs often include raw inputs, stack traces, system state, credentials, paths, or other confidential operational details. Long-term storage of those details expands the blast radius of any later file disclosure or misuse.

Ssd 3

Medium
Confidence
93% confidence
Finding
Capturing user corrections and preference shifts for future behavior implies persistent storage of user-provided information, potentially including sensitive preferences, identities, or confidential instructions, without stating any filtering or consent mechanism. This creates privacy and policy risks, especially across sessions and users.

Unvalidated Output Injection

High
Category
Output Handling
Content
from self_learning import log_failure, log_success

try:
    result = subprocess.run(['my-skill', '--arg', 'value'], capture_output=True, text=True)
    if result.returncode == 0:
        log_success("my-skill", {"args": ["--arg", "value"]}, result.stdout)
    else:
Confidence
85% confidence
Finding
subprocess.run(['my-skill', '--arg', 'value'], capture_output

Unvalidated Output Injection

High
Category
Output Handling
Content
Returns {"success": bool, "output": str, "failure_id": str or None}
    """
    try:
        result = subprocess.run(
            command,
            capture_output=True,
            text=True,
Confidence
74% confidence
Finding
subprocess.run( command, capture_output

Session Persistence

Medium
Category
Rogue Agent
Content
1. **Capture** — After each session or failure, log: what was attempted, what went wrong, system state
2. **Analyse** — Identify the root cause pattern, not just the symptom
3. **Generate** — Create a fix or prevention rule based on the pattern
4. **Validate** — Test the fix before committing it to memory
5. **Commit** — Update the agent's memory files only when validation succeeds
Confidence
88% confidence
Finding
Create a fix or prevention rule based on the pattern 4. **Validate** — Test the fix before committing it to memory 5. **Commit** — Update the agent's memory files only when validation succeeds ## Fil

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal