Skill Regression

Security checks across malware telemetry and agentic risk

Overview

This appears to be a skill-testing tool, but it gives test files and external integrations enough authority to run commands, send skill contents, and modify target files without sufficiently clear containment.

Review carefully before installing. Use it only on trusted skills and trusted TEST.md files, preferably inside a sandbox or disposable checkout with minimal credentials. Do not enable remote LLM evaluation, live-agent backends, or report uploads unless you are comfortable sending skill contents, prompts, outputs, paths, and reports to the configured destination.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Output HandlingUnvalidated Output Injection, Cross-Context Output, Unbounded Output
Tool MisuseTool Parameter Abuse, Chaining Abuse, Unsafe Defaults
Behavioral ASTexec() Call, eval() Call, Dynamic Import
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (9)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: # meaning any command in TEST.md will execute. Only trust TEST.md from trusted sources. # ⚠️ Output merging: actual_output = stdout + stderr, # which is risky for exact-mode assertions (stderr warnings break match). Prefer contains/regex. proc = subprocess.run( script_cmd, shell=True, cwd=skill_dir,
Confidence: 98% confidence
Finding: proc = subprocess.run( script_cmd, shell=True, cwd=skill_dir, capture_output=True, text=True, timeout=timeout, )

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The README advertises a report upload hook that can transmit generated regression reports to arbitrary external destinations, but it does not warn users that those reports may contain sensitive skill prompts, test cases, model outputs, paths, or other internal data. In a testing framework, reports often aggregate exactly the material users may not intend to disclose, so omission of a privacy/data-exfiltration warning is a real security documentation gap.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The README describes AI-layer tests using an external LLM and implies that SKILL.md content, triggers, expected responses, and possibly test artifacts are processed by networked services, but it does not clearly warn users that their skill content and test data may leave the local environment. For a tool intended to audit other skills, this omission can cause unintentional disclosure of proprietary or sensitive material.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The skill documents renaming files in the target skill directory but does not present that behavior as an upfront warning before use. Silent or easy-to-miss modification of user files is dangerous in a testing tool because it can alter repositories, interfere with workflows, or produce unintended changes that users did not consent to.

Missing User Warnings

Medium

Confidence: 87% confidence
Finding: This library sends SKILL.md content, user triggers, generated responses, and evaluation prompts to a remote OpenAI-compatible endpoint, which can expose sensitive prompts, proprietary skill logic, or user-provided data to third-party services. In this skill's context, that risk is elevated because the tool is explicitly designed to ingest and regress-test arbitrary skill content, including potentially adversarial or confidential instructions, yet this file provides no consent gate, redaction, or disclosure mechanism.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: When TEST.md is absent, the script sends SKILL.md and README.md content to an external OpenAI-compatible LLM via infer_test_cases without any explicit consent prompt or prominent warning at the call site. Skill documentation may contain proprietary instructions, credentials mistakenly embedded in docs, or other sensitive operational details, so this creates a real data-exposure risk even if it is part of expected functionality.

Missing User Warnings

Low

Confidence: 83% confidence
Finding: When the openclaw backend is enabled, arbitrary test triggers are forwarded to a real agent with no explicit confirmation or prominent warning at send time. In a regression-testing skill, this increases the chance that a user unintentionally executes harmful or side-effecting prompts against a live agent or connected tools.

Unvalidated Output Injection

High

Category: Output Handling
Content: # meaning any command in TEST.md will execute. Only trust TEST.md from trusted sources. # ⚠️ Output merging: actual_output = stdout + stderr, # which is risky for exact-mode assertions (stderr warnings break match). Prefer contains/regex. proc = subprocess.run( script_cmd, shell=True, cwd=skill_dir,
Confidence: 96% confidence
Finding: subprocess.run( script_cmd, shell=True, cwd=skill_dir, capture_output

Tool Parameter Abuse

High

Category: Tool Misuse
Content: # meaning any command in TEST.md will execute. Only trust TEST.md from trusted sources. # ⚠️ Output merging: actual_output = stdout + stderr, # which is risky for exact-mode assertions (stderr warnings break match). Prefer contains/regex. proc = subprocess.run( script_cmd, shell=True, cwd=skill_dir,
Confidence: 97% confidence
Finding: subprocess.run( script_cmd, shell=True

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal