Volcengine Ai Audio

Security checks across malware telemetry and agentic risk

Overview

This is a real cloud media-processing skill, but its audio-cleanup branding does not fully match the broader remote video editing, generation, and export authority it gives the backend.

Install only if you are comfortable sending selected audio/video files and related prompts to mega-api-prod.nemovideo.ai. Verify the provider relationship and privacy terms, treat NEMO_TOKEN as sensitive, and avoid private, regulated, or business-confidential media unless you understand retention, credit usage, and export behavior.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (6)

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: The skill is presented as an audio-enhancement tool, but the instructions authorize broad video-generation, rendering, export, and timeline-editing behaviors. This scope mismatch can cause the agent to handle user requests and media flows beyond what users reasonably expect, increasing the risk of unintended data processing and over-privileged backend use.

Description-Behavior Mismatch

Medium

Confidence: 90% confidence
Finding: The listed operations and formats extend far beyond simple audio cleaning into generalized media editing and rendering. Users invoking an audio-cleaning skill may unknowingly trigger broader processing of videos, images, and exports, which weakens informed consent and expands the attack surface of the skill.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: Generalized timeline-editing and GUI-action translation make the skill function like a remote editing agent rather than a narrow audio enhancer. That broader control surface can let ambiguous prompts trigger complex backend actions, making misuse and unintended operations more likely.

Vague Triggers

Medium

Confidence: 94% confidence
Finding: Routing all unmatched requests to SSE creates an overly permissive execution path where vague or unrelated prompts may be forwarded to the remote backend. In this skill, that is more dangerous because SSE appears to drive powerful editing operations and session state changes, so the catch-all greatly increases the chance of unintended remote actions.

Vague Triggers

Medium

Confidence: 77% confidence
Finding: The invocation text is broad enough that users may trigger the skill with generic media-editing requests, without understanding when cloud processing or session creation will occur. While less severe than direct command overreach, this ambiguity contributes to accidental activation and weak user consent.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The skill asks users to upload audio/video files but does not provide a clear upfront privacy warning that media is sent to a third-party cloud backend for processing. Because recordings and videos often contain sensitive personal or business information, insufficient disclosure materially increases privacy and compliance risk.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal