LLM Tester

Security checks across malware telemetry and agentic risk

Overview

This is a straightforward user-run LLM benchmarking skill, but users should understand that it sends selected samples and prompts to a configured LLM API and writes result reports locally.

Install only if you are comfortable sending the selected sample files and prompt templates to the configured LLM provider. Protect DASHSCOPE_API_KEY, avoid overriding LLM_API_BASE unless you trust the endpoint, and keep generated reports out of public repositories if they may contain sensitive prompts, outputs, or business data.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (5)

Tainted flow: 'API_BASE' from os.environ.get (line 28, credential/environment) → requests.post (network output)

Critical

Category: Data Flow
Content: start = time.time() try: resp = requests.post( API_BASE, headers={ "Authorization": f"Bearer {API_KEY}",
Confidence: 94% confidence
Finding: resp = requests.post( API_BASE, headers={ "Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json", },

Lp3

Medium

Category: MCP Least Privilege
Confidence: 95% confidence
Finding: The skill documentation clearly describes use of environment variables, reading sample and prompt files, writing benchmark reports, and sending requests to a remote API, yet no explicit permission declaration is present. This creates a transparency and consent problem: users may invoke a skill with broader capabilities than expected, including network transmission and filesystem writes.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The skill is designed to send sample text and prompt content to a remote LLM endpoint, but it does not warn users that potentially sensitive input data will leave the local environment. In benchmarking scenarios, samples may contain proprietary, personal, or confidential material, so silent transmission materially increases privacy and compliance risk.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The documented JSON report includes model outputs, token usage, timing, and per-sample results, but the skill does not warn users that these artifacts will be persisted to disk. If samples or model responses contain sensitive content, the generated report can become a secondary data leak that remains after execution and may be copied into logs, backups, or source repositories.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The tool reads local sample files and prompt templates, formats them into a request, and sends them to a remote LLM API, but it does not explicitly warn users that local content will leave the host. In a benchmarking tool this behavior is expected, yet it is still risky because users may unintentionally upload sensitive test data, proprietary prompts, or regulated content.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal