Hle Benchmark Evolver

Security checks across malware telemetry and agentic risk

Overview

This skill does what it describes, but it gives the benchmark pipeline broad local command-execution and evolver-control authority that users should review carefully before installing.

Install only if you trust the skill author, the local capability-evolver and feishu-evolver-wrapper code, and any evaluator command you pass to --eval_cmd. Prefer running result mode first, avoid --eval_cmd with untrusted input or paths, use --skip_evolve=true unless you intend to mutate evolver state, and run the pipeline in a sandbox or environment without sensitive credentials.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Findings (3)

Lp3

Medium
Category
MCP Least Privilege
Confidence
78% confidence
Finding
The skill invokes Node.js scripts and explicitly supports passing an external shell evaluation command via `--eval_cmd`, which implies access to environment-dependent execution despite declaring no permissions. This mismatch is dangerous because operators may assume the skill is low-privilege, while in practice it can run commands that inherit secrets and environment variables from the host process.

Context-Inappropriate Capability

High
Confidence
99% confidence
Finding
The skill accepts a user-controlled --eval_cmd string, substitutes {{report}}, and executes it via bash without validation, allowlisting, or sandboxing. In a benchmark-ingestion/curriculum pipeline this is unnecessary privilege that enables arbitrary command execution in the agent workspace, including file access, secret exfiltration, persistence, or invoking other tools with attacker-chosen arguments.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The pipeline executes attacker-controlled shell content through --eval_cmd with no warning, confirmation, or explicit high-risk mode. In an agent skill context, users may trigger this capability indirectly or assume the parameter is a normal evaluator option, increasing the chance of silent remote code execution and destructive actions.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal