OpenClaw Skill Tester

Security checks across malware telemetry and agentic risk

Overview

This skill is not destructive, but it presents simulated and hard-coded test results as if they were real skill validation.

Install only if you treat this as a demo or scaffold. Do not rely on its pass/fail reports, comparison metrics, or generated summaries for quality, security, CI, or deployment decisions until it runs real target skills, records actual exit codes, and builds reports from collected results.

SkillSpector

By NVIDIA

Vulnerability Patterns

MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (6)

Tp4

High

Category: MCP Tool Poisoning
Confidence: 96% confidence
Finding: The documented behavior indicates the skill may fabricate or simulate test outcomes, metrics, and reports instead of measuring real executions. In a testing framework, this is dangerous because it can create false assurance about skill quality, performance, and safety, causing unsafe or broken skills to be trusted, deployed, or approved based on misleading evidence.

Intent-Code Divergence

Medium

Confidence: 96% confidence
Finding: This script advertises comparison testing but never executes the requested skill or baseline; it returns fixed simulated metrics regardless of input. In a testing framework, this can mislead users into trusting fabricated performance results, causing unsafe deployment decisions or masking regressions and security issues in the actual skill.

Intent-Code Divergence

Medium

Confidence: 93% confidence
Finding: The script advertises functionality verification but never invokes the actual target skill; it only validates a hardcoded simulated response. This can create a false sense of assurance, allowing broken or unsafe skills to be marked as working and potentially pass release gates without real testing.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: This file is presented as an automated functionality tester, but its logic only checks canned simulated output and uses weak string-based field validation. In a testing framework context, that is dangerous because it can conceal regressions, integration failures, or policy-violating behavior while reporting success to developers or CI systems.

Intent-Code Divergence

High

Confidence: 99% confidence
Finding: The script writes a report with fixed pass/fail counts and performance metrics regardless of the actual test outcomes, which creates falsified assurance about skill quality. In a testing framework, this can mislead operators into deploying broken or unsafe skills and can conceal failures that should block release.

Intent-Code Divergence

Medium

Confidence: 98% confidence
Finding: Every test command is followed by '|| true', which suppresses failures and allows the runner to continue as if tests succeeded. This undermines the integrity of the test process and, combined with the generated report, can hide malfunctioning or insecure skills from maintainers.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal