Subtitle Generator From Audio

Security checks across malware telemetry and agentic risk

Overview

The skill appears to be a cloud media-processing tool, but it is presented narrowly as an SRT generator while allowing broader editing, rendering, URL import, and session actions.

Install only if you intend to use NemoVideo as a cloud media editing and rendering service, not just a simple SRT generator. Avoid confidential recordings unless you trust the backend and understand its retention policy, and require explicit confirmation before uploads, URL imports, exports, or generalized edit requests.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (6)

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: The manifest markets a narrow audio-to-subtitle capability, but the body exposes a substantially broader remote media-editing and rendering pipeline, including general edit commands, timeline manipulation, export formats, and state inspection. This scope mismatch can cause users or host systems to grant trust, permissions, or routing based on an understated capability set, increasing the chance of unintended data processing or misuse.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: Allowing import from arbitrary URLs expands the attack surface beyond user-uploaded audio and is not necessary for the stated subtitle-generation purpose. URL fetch features can enable server-side request forgery, retrieval of internal resources, or ingestion of unexpected sensitive content if the backend dereferences attacker-controlled URLs.

Vague Triggers

Medium

Confidence: 88% confidence
Finding: The invocation guidance is broad enough that ordinary user conversation could unintentionally activate the skill. In a skill that uploads media and interacts with a remote backend, accidental activation can lead to unintended cloud processing, token use, or data transfer without sufficiently deliberate user intent.

Vague Triggers

Medium

Confidence: 90% confidence
Finding: The sample trigger phrase is generic and incomplete, making it more likely to match benign conversation rather than an intentional request to process media. Because this skill can create sessions and send data to a cloud service, weak trigger specificity increases the risk of unintended execution and privacy-impacting behavior.

Vague Triggers

Medium

Confidence: 95% confidence
Finding: The catch-all routing rule sends essentially all non-matching requests to the SSE editing backend, creating an overly permissive activation path. This is particularly risky because the backend appears capable of general media manipulation and stateful remote actions, so ambiguous prompts may trigger unexpected operations or broaden processing beyond the user's intent.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The documentation states that files are uploaded and processed on remote GPU nodes but does not present a clear upfront warning that user media leaves the local system. For audio content, this can expose sensitive recordings, voices, or confidential material to third-party infrastructure without sufficiently explicit informed consent.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal