Ai Audio Editor

Security checks across malware telemetry and agentic risk

Overview

This is a coherent cloud-backed media editing skill, but users should know it uploads provided media and editing prompts to an external NemoVideo service.

Install only if you are comfortable sending selected audio/video files, URLs, edit prompts, and render state to mega-api-prod.nemovideo.ai. Avoid sensitive or confidential media unless you trust that service's privacy and retention practices, and use clear editing instructions so unrelated text is not routed into the cloud editing session.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (6)

Description-Behavior Mismatch

Medium
Confidence
87% confidence
Finding
The skill is presented as an audio-cleaning/editor tool, but its documented behavior includes broader video-editing and rendering capabilities. This scope mismatch can mislead users and calling agents into granting broader access or invoking actions they would not reasonably expect from an audio-focused skill, increasing the chance of unintended processing or data exposure.

Context-Inappropriate Capability

Medium
Confidence
93% confidence
Finding
Allowing arbitrary remote URL uploads expands the trust boundary beyond user-supplied local media and can be abused to fetch unexpected third-party content. In agent ecosystems, this can enable indirect access to internal or sensitive URLs, unauthorized processing of remote data, or ingestion of content the user did not intentionally provide.

Intent-Code Divergence

Medium
Confidence
89% confidence
Finding
The public description frames the skill as an audio editor, while the internal routing expands it to general video-editing actions such as overlays and aspect-ratio changes. This inconsistency can cause users or orchestrators to invoke the skill under a narrower trust assumption than its real behavior, making overbroad actions more likely.

Vague Triggers

Medium
Confidence
78% confidence
Finding
The invocation phrase is broad enough to overlap with ordinary conversation, which raises the risk of accidental activation. Because the skill can create sessions, upload media, and call cloud APIs, unintended invocation could trigger external processing or requests without sufficiently clear user intent.

Vague Triggers

Medium
Confidence
90% confidence
Finding
The catch-all rule routes nearly any non-matched request into the SSE editing path, which is overly permissive for a skill that can manipulate media and communicate with a cloud backend. Ambiguous routing increases the likelihood that unrelated user text is treated as an editing command, causing unintended actions or disclosure of uploaded content to the backend.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The skill sends user media to a cloud processing backend but does not clearly warn users up front that files and content will leave the local environment. This weakens informed consent and can lead to accidental disclosure of sensitive audio/video data, especially given support for large files and persistent session handling.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal