Audio To Subtitle Generator

Security checks across malware telemetry and agentic risk

Overview

This skill is a real cloud subtitle workflow, but it also gives the agent broader NemoVideo editing and full video export authority than a subtitle generator clearly implies.

Install only if you are comfortable sending audio or video to NemoVideo's cloud service and letting the skill create anonymous tokens and sessions. Keep use limited to subtitle/transcription tasks, avoid confidential or regulated media unless you trust NemoVideo's handling practices, and treat workspace claim/session links as sensitive.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Findings (5)

Description-Behavior Mismatch

High
Confidence
97% confidence
Finding
The skill is presented as a subtitle-generation tool, but the instructions expand its authority into general video editing, session management, and export workflows. This creates a scope mismatch that can mislead users and hosting platforms about what data and actions the skill may perform, increasing the risk of unauthorized media modification or exfiltration beyond the stated purpose.

Context-Inappropriate Capability

High
Confidence
98% confidence
Finding
Rendered video export is a materially broader capability than subtitle generation because it enables producing and downloading full media outputs, not just caption files. In a skill marketed for subtitles, this can be abused to transform, package, or retrieve user media in ways the user did not explicitly authorize.

Context-Inappropriate Capability

High
Confidence
99% confidence
Finding
Allowing commands like adding BGM and drag/drop timeline edits turns a transcription skill into a general editing agent. That scope expansion is dangerous because users may provide sensitive source media expecting transcription only, while the skill is empowered to alter project state and perform unrelated media operations.

Intent-Code Divergence

Medium
Confidence
88% confidence
Finding
The documentation first frames the backend as a speech-to-text subtitle service, then instructs the skill to translate GUI-like editing actions into API operations. This inconsistency obscures the true operational scope and can prevent users from understanding that their requests may trigger broader project manipulation rather than simple subtitle generation.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The skill encourages users to upload video files but does not clearly warn them that the media and related request data are sent to NemoVideo's cloud backend for processing. This is a privacy and transparency issue, especially because uploaded media may contain sensitive audio, faces, business information, or regulated content.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal