Test-Driven Revolution

Security checks across malware telemetry and agentic risk

Overview

This skill is a disclosed code-automation workflow, but it can run task-provided shell commands automatically and its safety checks do not match the level of authority it requests.

Install only in a disposable or tightly isolated workspace, and do not enable the cron heartbeats until you have reviewed and restricted the executor. Treat task JSON and review output as code: do not include secrets, inspect next_instructions before applying them, and avoid using this on repositories or machines with sensitive credentials.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Findings (6)

Intent-Code Divergence

Medium
Confidence
94% confidence
Finding
The lock acquisition uses an atomic mkdir, but the release path blindly executes rm -rf on a path derived from user-controlled task_id without verifying that the caller owns the lock or that the target is a legitimate lock directory. In a multi-user or adversarial environment, this can let one actor delete another actor’s lock or remove an unintended directory if task_id contains path traversal sequences, undermining synchronization and potentially deleting arbitrary files under reachable paths.

Intent-Code Divergence

Medium
Confidence
96% confidence
Finding
The script represents itself as an auditor that checks instruction compliance, acceptance criteria, and security issues, but the active fallback only inspects task metadata and whether the last execution record says success or failure. This can allow unsafe, untested, or non-compliant outputs to be marked as passed, creating a false sense of security and enabling bad or malicious changes to advance through the workflow.

Intent-Code Divergence

High
Confidence
99% confidence
Finding
The code claims to execute task instructions 'in a sandbox', but actually passes review.next_instructions directly to execSync on the host shell. Because these instructions come from task review data, any actor able to influence task content can achieve arbitrary command execution on the machine running the heartbeat, with access to the filesystem, network, and any inherited credentials.

Intent-Code Divergence

Medium
Confidence
95% confidence
Finding
The function labeled as acceptance-criteria verification unconditionally sets passed = true for every criterion, so execution results are marked as validated even when no real checks occur. In this workflow, that can let unsafe or incorrect task outputs advance through the pipeline and be audited or promoted under false assumptions of success.

Missing User Warnings

High
Confidence
99% confidence
Finding
Arbitrary shell instructions from task.review.next_instructions are executed without interactive confirmation, policy enforcement, or meaningful sanitization. In the context of an automated heartbeat processing reviewed tasks every five minutes, this creates a highly dangerous unattended remote-code-execution path if task data is malicious or compromised.

Missing User Warnings

Medium
Confidence
92% confidence
Finding
The script embeds the full task JSON into a prompt intended for an external review session/model, and the surrounding instructions explicitly tell the operator to send it to another model/session. If task files contain secrets, internal code, customer data, or other sensitive context, this creates a data exfiltration path without redaction, consent checks, or even a warning to the user.

VirusTotal

VirusTotal engine telemetry is currently stale for this artifact.

View on VirusTotal