Audio Command Handler

Security checks across malware telemetry and agentic risk

Overview

This skill openly turns audio into agent commands, but it gives spoken/transcribed content too much authority and can upload long results without enough user control.

Review before installing. Use this only if you intentionally want voice messages to become actionable agent instructions, and configure the agent to show the transcript and ask before executing commands, writing files, or uploading generated content. Avoid sending private, regulated, or secret-bearing audio unless you trust the transcription and upload services.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (6)

Lp3

Medium
Category
MCP Least Privilege
Confidence
92% confidence
Finding
The skill instructs use of shell commands and file-writing behavior but declares no permissions, creating a capability/permission mismatch that weakens review and policy enforcement. In this context, the skill handles untrusted audio-derived content and can save artifacts to disk, so undeclared shell and file-write capabilities increase the chance of unsafe execution or data handling without explicit scrutiny.

Tp4

High
Category
MCP Tool Poisoning
Confidence
95% confidence
Finding
The documented behavior says the skill will execute transcribed audio as commands and automatically upload long results, while the analyzed implementation behavior instead appears to prepare metadata, expose arbitrary output-file writing, and not actually perform the promised upload flow. This mismatch is dangerous because operators and downstream agents may trust the documentation and invoke the skill in ways that bypass expected safeguards, while hidden or undocumented file-output features expand the attack surface.

Vague Triggers

Medium
Confidence
90% confidence
Finding
The activation description is broad enough to match any incoming audio message, including casual voice notes or content not intended as an instruction. In this skill, broad triggering is especially risky because the workflow can convert speech into executable commands or uploaded output, so accidental activation can cause unintended actions or disclosure.

Vague Triggers

Medium
Confidence
97% confidence
Finding
The instruction to use the transcription as the command lacks any trust boundary, sanitization, or exclusion criteria, meaning arbitrary spoken content is elevated directly into executable intent. Because audio is an untrusted medium and transcription may be inaccurate or manipulated, this creates a command-injection-by-design pattern at the agent level, enabling unintended tool use, destructive actions, or data access.

Missing User Warnings

High
Confidence
96% confidence
Finding
The skill omits a clear warning that user audio is sent to an external transcription service and that outputs may later be saved and uploaded. This is dangerous because users may speak sensitive information assuming local handling, but the workflow can transmit and publish that content through third-party services or URLs without informed consent.

Ssd 3

Medium
Confidence
94% confidence
Finding
The workflow automatically treats transcribed speech as actionable input and may save or upload resulting content, which can expose spoken secrets, personal data, or confidential business information in plain text. The risk is heightened here because audio commonly contains sensitive content, transcription normalizes it into searchable text, and the upload step can further widen access via shareable links.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal