Hle Benchmark Evolver

Security checks across malware telemetry and agentic risk

Overview

This skill does what it describes, but it gives the benchmark pipeline broad local command-execution and evolver-control authority that users should review carefully before installing.

Install only if you trust the skill author, the local capability-evolver and feishu-evolver-wrapper code, and any evaluator command you pass to --eval_cmd. Prefer running result mode first, avoid --eval_cmd with untrusted input or paths, use --skip_evolve=true unless you intend to mutate evolver state, and run the pipeline in a sandbox or environment without sensitive credentials.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (3)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 78% confidence
Finding: The skill invokes Node.js scripts and explicitly supports passing an external shell evaluation command via `--eval_cmd`, which implies access to environment-dependent execution despite declaring no permissions. This mismatch is dangerous because operators may assume the skill is low-privilege, while in practice it can run commands that inherit secrets and environment variables from the host process.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: The skill accepts a user-controlled --eval_cmd string, substitutes {{report}}, and executes it via bash without validation, allowlisting, or sandboxing. In a benchmark-ingestion/curriculum pipeline this is unnecessary privilege that enables arbitrary command execution in the agent workspace, including file access, secret exfiltration, persistence, or invoking other tools with attacker-chosen arguments.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The pipeline executes attacker-controlled shell content through --eval_cmd with no warning, confirmation, or explicit high-risk mode. In an agent skill context, users may trigger this capability indirectly or assume the parameter is a normal evaluator option, increasing the chance of silent remote code execution and destructive actions.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal