LLM Tester

Security checks across malware telemetry and agentic risk

Overview

This is a straightforward user-run LLM benchmarking skill, but users should understand that it sends selected samples and prompts to a configured LLM API and writes result reports locally.

Install only if you are comfortable sending the selected sample files and prompt templates to the configured LLM provider. Protect DASHSCOPE_API_KEY, avoid overriding LLM_API_BASE unless you trust the endpoint, and keep generated reports out of public repositories if they may contain sensitive prompts, outputs, or business data.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Findings (5)

Tainted flow: 'API_BASE' from os.environ.get (line 28, credential/environment) → requests.post (network output)

Critical
Category
Data Flow
Content
start = time.time()
    try:
        resp = requests.post(
            API_BASE,
            headers={
                "Authorization": f"Bearer {API_KEY}",
Confidence
94% confidence
Finding
resp = requests.post( API_BASE, headers={ "Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json", },

Lp3

Medium
Category
MCP Least Privilege
Confidence
95% confidence
Finding
The skill documentation clearly describes use of environment variables, reading sample and prompt files, writing benchmark reports, and sending requests to a remote API, yet no explicit permission declaration is present. This creates a transparency and consent problem: users may invoke a skill with broader capabilities than expected, including network transmission and filesystem writes.

Missing User Warnings

Medium
Confidence
97% confidence
Finding
The skill is designed to send sample text and prompt content to a remote LLM endpoint, but it does not warn users that potentially sensitive input data will leave the local environment. In benchmarking scenarios, samples may contain proprietary, personal, or confidential material, so silent transmission materially increases privacy and compliance risk.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
The documented JSON report includes model outputs, token usage, timing, and per-sample results, but the skill does not warn users that these artifacts will be persisted to disk. If samples or model responses contain sensitive content, the generated report can become a secondary data leak that remains after execution and may be copied into logs, backups, or source repositories.

Missing User Warnings

Medium
Confidence
88% confidence
Finding
The tool reads local sample files and prompt templates, formats them into a request, and sends them to a remote LLM API, but it does not explicitly warn users that local content will leave the host. In a benchmarking tool this behavior is expected, yet it is still risky because users may unintentionally upload sensitive test data, proprietary prompts, or regulated content.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal