Agent Benchmark

Security checks across malware telemetry and agentic risk

Overview

This appears to be a legitimate benchmark skill, but it runs task code with broad local access and has under-disclosed file, environment, and persistence risks.

Install or run this only if you are comfortable treating its task files as local executable code. Use trusted task sets, run it in a disposable or sandboxed workspace, clear sensitive environment variables first, and review or disable the memory-report behavior and task path handling before using it on a machine with secrets or important files.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Findings (4)

Intent-Code Divergence

Medium
Confidence
97% confidence
Finding
The code advertises a memory limit in CONFIG but never enforces it when spawning untrusted programs. Because this framework executes arbitrary task code, an attacker can consume excessive memory or other host resources, causing denial of service or destabilizing the machine running the benchmark.

Missing User Warnings

High
Confidence
99% confidence
Finding
The framework writes attacker-controlled code to disk and executes it directly via Python, Node, or Go. In this benchmarking context that is effectively arbitrary code execution on the host, so malicious tasks can read files, exfiltrate data, modify the system, or pivot using the runner's privileges.

Missing User Warnings

High
Confidence
99% confidence
Finding
The subprocess inherits nearly the full parent environment through `{ ...process.env, NODE_ENV: 'test' }`, exposing secrets such as API keys, tokens, cloud credentials, and internal configuration to untrusted executed code. Since the child process is attacker-controlled, it can simply print or exfiltrate these values.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
This task explicitly reads and prints $env:COMPUTERNAME and $env:USERNAME, which exposes host and user identity information to task output. While the benchmark context suggests this is likely intended as a simple system-information exercise rather than data theft, such disclosure can still leak sensitive environment details into logs, transcripts, or remote evaluation systems.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal