Audio Command Handler

Security checks across malware telemetry and agentic risk

Overview

This skill openly turns audio into agent commands, but it gives spoken/transcribed content too much authority and can upload long results without enough user control.

Review before installing. Use this only if you intentionally want voice messages to become actionable agent instructions, and configure the agent to show the transcript and ask before executing commands, writing files, or uploading generated content. Avoid sending private, regulated, or secret-bearing audio unless you trust the transcription and upload services.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (6)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 92% confidence
Finding: The skill instructs use of shell commands and file-writing behavior but declares no permissions, creating a capability/permission mismatch that weakens review and policy enforcement. In this context, the skill handles untrusted audio-derived content and can save artifacts to disk, so undeclared shell and file-write capabilities increase the chance of unsafe execution or data handling without explicit scrutiny.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 95% confidence
Finding: The documented behavior says the skill will execute transcribed audio as commands and automatically upload long results, while the analyzed implementation behavior instead appears to prepare metadata, expose arbitrary output-file writing, and not actually perform the promised upload flow. This mismatch is dangerous because operators and downstream agents may trust the documentation and invoke the skill in ways that bypass expected safeguards, while hidden or undocumented file-output features expand the attack surface.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: The activation description is broad enough to match any incoming audio message, including casual voice notes or content not intended as an instruction. In this skill, broad triggering is especially risky because the workflow can convert speech into executable commands or uploaded output, so accidental activation can cause unintended actions or disclosure.

Vague Triggers

Medium

Confidence: 97% confidence
Finding: The instruction to use the transcription as the command lacks any trust boundary, sanitization, or exclusion criteria, meaning arbitrary spoken content is elevated directly into executable intent. Because audio is an untrusted medium and transcription may be inaccurate or manipulated, this creates a command-injection-by-design pattern at the agent level, enabling unintended tool use, destructive actions, or data access.

Missing User Warnings

High

Confidence: 96% confidence
Finding: The skill omits a clear warning that user audio is sent to an external transcription service and that outputs may later be saved and uploaded. This is dangerous because users may speak sensitive information assuming local handling, but the workflow can transmit and publish that content through third-party services or URLs without informed consent.

Ssd 3

Medium

Confidence: 94% confidence
Finding: The workflow automatically treats transcribed speech as actionable input and may save or upload resulting content, which can expose spoken secrets, personal data, or confidential business information in plain text. The risk is heightened here because audio commonly contains sensitive content, transcription normalizes it into searchable text, and the upload step can further widen access via shareable links.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal