Skill Regression

Security checks across malware telemetry and agentic risk

Overview

This appears to be a skill-testing tool, but it gives test files and external integrations enough authority to run commands, send skill contents, and modify target files without sufficiently clear containment.

Review carefully before installing. Use it only on trusted skills and trusted TEST.md files, preferably inside a sandbox or disposable checkout with minimal credentials. Do not enable remote LLM evaluation, live-agent backends, or report uploads unless you are comfortable sending skill contents, prompts, outputs, paths, and reports to the configured destination.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Output HandlingUnvalidated Output Injection, Cross-Context Output, Unbounded Output
  • Tool MisuseTool Parameter Abuse, Chaining Abuse, Unsafe Defaults
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (9)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
# meaning any command in TEST.md will execute. Only trust TEST.md from trusted sources.
        # ⚠️  Output merging: actual_output = stdout + stderr,
        # which is risky for exact-mode assertions (stderr warnings break match). Prefer contains/regex.
        proc = subprocess.run(
            script_cmd,
            shell=True,
            cwd=skill_dir,
Confidence
98% confidence
Finding
proc = subprocess.run( script_cmd, shell=True, cwd=skill_dir, capture_output=True, text=True, timeout=timeout, )

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The README advertises a report upload hook that can transmit generated regression reports to arbitrary external destinations, but it does not warn users that those reports may contain sensitive skill prompts, test cases, model outputs, paths, or other internal data. In a testing framework, reports often aggregate exactly the material users may not intend to disclose, so omission of a privacy/data-exfiltration warning is a real security documentation gap.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The README describes AI-layer tests using an external LLM and implies that SKILL.md content, triggers, expected responses, and possibly test artifacts are processed by networked services, but it does not clearly warn users that their skill content and test data may leave the local environment. For a tool intended to audit other skills, this omission can cause unintentional disclosure of proprietary or sensitive material.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
The skill documents renaming files in the target skill directory but does not present that behavior as an upfront warning before use. Silent or easy-to-miss modification of user files is dangerous in a testing tool because it can alter repositories, interfere with workflows, or produce unintended changes that users did not consent to.

Missing User Warnings

Medium
Confidence
87% confidence
Finding
This library sends SKILL.md content, user triggers, generated responses, and evaluation prompts to a remote OpenAI-compatible endpoint, which can expose sensitive prompts, proprietary skill logic, or user-provided data to third-party services. In this skill's context, that risk is elevated because the tool is explicitly designed to ingest and regress-test arbitrary skill content, including potentially adversarial or confidential instructions, yet this file provides no consent gate, redaction, or disclosure mechanism.

Missing User Warnings

Medium
Confidence
90% confidence
Finding
When TEST.md is absent, the script sends SKILL.md and README.md content to an external OpenAI-compatible LLM via infer_test_cases without any explicit consent prompt or prominent warning at the call site. Skill documentation may contain proprietary instructions, credentials mistakenly embedded in docs, or other sensitive operational details, so this creates a real data-exposure risk even if it is part of expected functionality.

Missing User Warnings

Low
Confidence
83% confidence
Finding
When the openclaw backend is enabled, arbitrary test triggers are forwarded to a real agent with no explicit confirmation or prominent warning at send time. In a regression-testing skill, this increases the chance that a user unintentionally executes harmful or side-effecting prompts against a live agent or connected tools.

Unvalidated Output Injection

High
Category
Output Handling
Content
# meaning any command in TEST.md will execute. Only trust TEST.md from trusted sources.
        # ⚠️  Output merging: actual_output = stdout + stderr,
        # which is risky for exact-mode assertions (stderr warnings break match). Prefer contains/regex.
        proc = subprocess.run(
            script_cmd,
            shell=True,
            cwd=skill_dir,
Confidence
96% confidence
Finding
subprocess.run( script_cmd, shell=True, cwd=skill_dir, capture_output

Tool Parameter Abuse

High
Category
Tool Misuse
Content
# meaning any command in TEST.md will execute. Only trust TEST.md from trusted sources.
        # ⚠️  Output merging: actual_output = stdout + stderr,
        # which is risky for exact-mode assertions (stderr warnings break match). Prefer contains/regex.
        proc = subprocess.run(
            script_cmd,
            shell=True,
            cwd=skill_dir,
Confidence
97% confidence
Finding
subprocess.run( script_cmd, shell=True

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal