Benchmark Tool

Security checks across malware telemetry and agentic risk

Overview

This benchmark skill mostly matches its stated purpose, but it includes an arbitrary file comparison command and unscoped disk/network actions that deserve review before installation.

Review this before installing. Use it only in a scratch directory and only against network hosts you intend to contact. Avoid using the compare command on sensitive files unless you deliberately want their differences shown to the agent or conversation.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (6)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 88% confidence
Finding: The skill exposes shell-based commands in SKILL.md but does not declare any permissions, which undermines informed consent and security review. Even if the benchmark behavior is expected, shell execution can access local files, invoke network operations, and affect system state, so omitting permissions increases the risk of unsafe deployment or misuse.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 80% confidence
Finding: The advertised purpose is system benchmarking, but the presence of a compare command for arbitrary files expands the skill beyond that scope. This mismatch can hide additional file-reading capability from users and reviewers, enabling unintended access to sensitive local data or use of the skill as a generic file inspection tool.

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: The `compare` command exposes a generic arbitrary file-diff capability that is unrelated to the stated benchmarking purpose. In an agent/tooling context, this broadens the skill's authority and can be abused to inspect and compare sensitive local files, increasing data exposure risk beyond expected benchmark operations.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: The code allows `diff $2 $3` on arbitrary user-supplied paths without any scope restriction, which is not justified by the tool's benchmark-only description. In agent environments, unnecessary file access primitives are dangerous because they can be repurposed to read or infer contents of sensitive files through tool outputs.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The disk benchmark writes a 100MB test file and then deletes it, but the script does not provide any user-facing warning that it will modify the filesystem. In a security-sensitive or production environment, undisclosed write/delete behavior can cause operational surprises, consume space, and interact badly with sensitive or unexpected target directories.

Missing User Warnings

Medium

Confidence: 86% confidence
Finding: The network benchmark makes an outbound `curl` request to a user-supplied host without clearly disclosing that it will contact external systems. In restricted environments, this can violate policy, leak metadata such as DNS queries and source IP, or be used to probe internal endpoints under the guise of benchmarking.

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal