Ai Voiceover For Video

Security checks across malware telemetry and agentic risk

Overview

This skill is a disclosed cloud video voiceover tool that uploads selected media and prompts to NemoVideo for processing, with some consent and clarity caveats but no evidence of deception or harmful local behavior.

Install only if you trust NemoVideo with the videos, audio, images, URLs, prompts, and generated timeline data you provide. Use a dedicated or revocable NEMO_TOKEN when possible, avoid confidential media unless you are comfortable uploading it to the service, and ask the agent to confirm before uploading or exporting ambiguous requests.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (7)

Description-Behavior Mismatch

Medium

Confidence: 87% confidence
Finding: The skill materially expands from narrow voiceover generation into a broad remote video-editing and rendering controller with upload, state inspection, SSE-driven edits, and export orchestration. That broader capability increases the chance of unintended data transfer and over-privileged agent behavior beyond what a user may reasonably infer from the skill name and description.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: The skill instructs the agent to autonomously obtain an anonymous token and create backend sessions, which is a form of credential acquisition and account/session bootstrapping. Even if intended for convenience, this enables networked account-like access without clear user consent and normalizes hidden authentication flows to a third-party service.

Vague Triggers

Medium

Confidence: 84% confidence
Finding: The phrase 'Or just tell me what you're thinking' is overly broad and can cause accidental invocation from ordinary conversation unrelated to this skill. Broad triggers are dangerous because they can route generic user messages and attached files into remote processing flows without a crisp activation boundary.

Vague Triggers

Medium

Confidence: 89% confidence
Finding: The example trigger 'add my video files' overlaps with common file-handling language and is not specific to voiceover generation. This makes accidental activation more likely when a user merely wants to share or organize files, potentially causing uploads or backend actions they did not intend.

Vague Triggers

Medium

Confidence: 91% confidence
Finding: A catch-all rule routing 'Everything else' to the SSE edit path creates an ambiguous and effectively unbounded activation surface. In a skill that can upload media, maintain session state, and issue remote editing commands, this significantly increases the risk of unintended backend actions from loosely related prompts.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The skill does not clearly warn users that uploaded videos, prompts, and timeline state are transmitted to a cloud backend for processing. Because videos may contain sensitive personal, business, or copyrighted content, failing to provide upfront disclosure undermines informed consent and can lead to unintended data exposure to a third party.

Natural-Language Policy Violations

Medium

Confidence: 74% confidence
Finding: Hard-coding the session language to English without user choice can mis-handle prompts, narration, or metadata for non-English users and may cause unintended processing results. While less severe than the consent and scope issues, it is still a security-adjacent trust and integrity problem because the service behavior is silently constrained in a way the user did not request.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal