OpenClaw-Skill-Creator

Security checks across malware telemetry and agentic risk

Overview

This skill is not clearly malicious, but it needs review because its helper scripts can send skill/eval content through the Claude CLI, write into local Claude project files, and terminate local processes on a port.

Install only if you intentionally want a Claude Code-style skill creation and evaluation workflow. Avoid using it on proprietary or secret skill content unless you are comfortable sending prompts and eval data through your configured Claude CLI session. Prefer the safer successor mentioned in the artifact changelog, or run this in an isolated workspace and avoid the server mode unless you accept that it may terminate an existing local process on the chosen port.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (9)

Intent-Code Divergence

Medium

Confidence: 91% confidence
Finding: The skill gives contradictory operational guidance about whether OpenClaw has subagents and whether to use browser/server review versus static HTML review. In an agentic workflow, inconsistent capability assumptions can cause the agent to choose the wrong execution path, skip safeguards, fail to collect feedback, or mishandle evaluation steps, reducing reliability and weakening review controls.

Intent-Code Divergence

Medium

Confidence: 92% confidence
Finding: The file defines two materially different agent roles and output contracts in a single skill without an explicit dispatch boundary. An agent given mixed inputs could follow the benchmark-analysis section instead of the post-hoc analyzer section, producing the wrong output shape and analyzing the wrong artifacts, which can silently corrupt evaluation workflows or overwrite expected results with incompatible JSON.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The top-level docstring describes the script as generating and serving a review page, but omits that it will also terminate any process bound to the selected port. That hidden side effect can surprise operators, cause loss of service for unrelated local applications, and makes the tool more dangerous than advertised.

Intent-Code Divergence

High

Confidence: 98% confidence
Finding: The helper is named and documented as if it merely checks port usage, but it actually enumerates PIDs and sends SIGTERM to each matching process. Misleading documentation around destructive behavior increases the risk of accidental misuse and can hide operationally unsafe actions during review.

Vague Triggers

High

Confidence: 93% confidence
Finding: The description-writing guidance explicitly recommends making skill descriptions 'pushy' and triggering on broad adjacent phrases even when the user does not ask for the skill. This increases over-triggering and skill collisions, which can cause the wrong skill to activate, override user intent, or pull the agent into unnecessary file/tool workflows with broader access than needed.

Vague Triggers

Medium

Confidence: 82% confidence
Finding: The invocation guidance tells the agent to 'jump in and help' based on rough process stage but does not provide concrete limits for when the skill should not trigger. That ambiguity can lead to unnecessary activation, context hijacking, or the agent initiating eval, packaging, or optimization workflows when the user only wanted lightweight advice.

Natural-Language Policy Violations

Medium

Confidence: 90% confidence
Finding: Conflicting environment-specific instructions create ambiguous policy behavior: one section says OpenClaw lacks subagents and should skip benchmarking/baselines, while another says subagents are available and the full workflow works. This ambiguity can produce inconsistent compliance with testing, review, and feedback steps, undermining assurance and making results hard to trust.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The script unconditionally calls _kill_port(port) before binding the HTTP server, with no interactive warning or explicit user consent at the moment of execution. In context, this is a local review utility, so killing unrelated services on the same host is unnecessary and can disrupt development tools, dashboards, or other applications.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The script sends the assembled prompt, including full skill content, eval history, and queries, to an external `claude` process with no consent gate or data-sensitivity check at the call site. If those inputs contain secrets, proprietary prompts, or sensitive evaluation data, they may be exposed outside the local process boundary to a remote model backend via the CLI.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal