Voice Over Ai

Security checks across malware telemetry and agentic risk

Overview

This appears to be a real cloud video narration skill, but it gives the agent broad authority to send media and open-ended prompts to a third-party backend with unclear scoping.

Review before installing. Use it only with media you are comfortable sending to NemoVideo, prefer a dedicated or low-privilege token, and ask the agent to confirm before uploading files or forwarding broad edit instructions.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (6)

Description-Behavior Mismatch

Medium

Confidence: 89% confidence
Finding: The skill is presented as a narrow AI voice-over tool, but the routing and workflow instructions expand it into a general video-editing and media-processing interface. This scope mismatch can cause the agent to invoke the skill for user requests beyond what was disclosed, increasing the chance of unexpected data handling and user confusion about what actions will be performed remotely.

Description-Behavior Mismatch

Medium

Confidence: 92% confidence
Finding: The supported formats list includes many media types and transformations far beyond the stated voice-over workflow, indicating materially broader capability than advertised. Hidden expansion of input/output scope can lead users to submit content under false assumptions and enables the skill to function as a more general media-processing conduit than expected.

Vague Triggers

High

Confidence: 97% confidence
Finding: Routing 'Everything else' to the SSE action is an overly broad catch-all that can capture unrelated user prompts and send them to a remote backend. In an agent setting, this creates a prompt-hijacking surface where ambiguous or unmatched requests are still processed externally, potentially causing unintended actions or unnecessary disclosure of user content.

Vague Triggers

Medium

Confidence: 79% confidence
Finding: The invocation guidance mixes voice-over, aspect ratio, text overlays, and audio tracks, making activation boundaries unclear. Ambiguous triggering increases the likelihood that the skill will be invoked for tasks outside the user's understanding of the skill's purpose, especially because the service performs remote processing.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The skill asks users to upload media and immediately establishes a backend session, but it does not prominently warn users that their files and prompts are transmitted to third-party remote services. For a media tool handling potentially sensitive audio/video, lack of upfront disclosure undermines informed consent and can expose private content to external processing without clear notice.

Natural-Language Policy Violations

Medium

Confidence: 88% confidence
Finding: The session creation request hardcodes `"language":"en"` without consulting user preference or detected locale. This can mis-handle multilingual content, reduce reliability of generated narration, and silently send user interactions under an unintended language setting.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal