Test-Driven Revolution

Security checks across malware telemetry and agentic risk

Overview

This skill is a disclosed code-automation workflow, but it can run task-provided shell commands automatically and its safety checks do not match the level of authority it requests.

Install only in a disposable or tightly isolated workspace, and do not enable the cron heartbeats until you have reviewed and restricted the executor. Treat task JSON and review output as code: do not include secrets, inspect next_instructions before applying them, and avoid using this on repositories or machines with sensitive credentials.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (6)

Intent-Code Divergence

Medium

Confidence: 94% confidence
Finding: The lock acquisition uses an atomic mkdir, but the release path blindly executes rm -rf on a path derived from user-controlled task_id without verifying that the caller owns the lock or that the target is a legitimate lock directory. In a multi-user or adversarial environment, this can let one actor delete another actor’s lock or remove an unintended directory if task_id contains path traversal sequences, undermining synchronization and potentially deleting arbitrary files under reachable paths.

Intent-Code Divergence

Medium

Confidence: 96% confidence
Finding: The script represents itself as an auditor that checks instruction compliance, acceptance criteria, and security issues, but the active fallback only inspects task metadata and whether the last execution record says success or failure. This can allow unsafe, untested, or non-compliant outputs to be marked as passed, creating a false sense of security and enabling bad or malicious changes to advance through the workflow.

Intent-Code Divergence

High

Confidence: 99% confidence
Finding: The code claims to execute task instructions 'in a sandbox', but actually passes review.next_instructions directly to execSync on the host shell. Because these instructions come from task review data, any actor able to influence task content can achieve arbitrary command execution on the machine running the heartbeat, with access to the filesystem, network, and any inherited credentials.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The function labeled as acceptance-criteria verification unconditionally sets passed = true for every criterion, so execution results are marked as validated even when no real checks occur. In this workflow, that can let unsafe or incorrect task outputs advance through the pipeline and be audited or promoted under false assumptions of success.

Missing User Warnings

High

Confidence: 99% confidence
Finding: Arbitrary shell instructions from task.review.next_instructions are executed without interactive confirmation, policy enforcement, or meaningful sanitization. In the context of an automated heartbeat processing reviewed tasks every five minutes, this creates a highly dangerous unattended remote-code-execution path if task data is malicious or compromised.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The script embeds the full task JSON into a prompt intended for an external review session/model, and the surrounding instructions explicitly tell the operator to send it to another model/session. If task files contain secrets, internal code, customer data, or other sensitive context, this creates a data exfiltration path without redaction, consent checks, or even a warning to the user.

VirusTotal

VirusTotal engine telemetry is currently stale for this artifact.

View on VirusTotal