Video Transcript Generator Free Online

Security checks across malware telemetry and agentic risk

Overview

The skill appears to send media and broad user instructions to a cloud video backend despite being described much more narrowly as transcript generation.

Review before installing. Only use this if you are comfortable sending selected videos, prompts, and editing instructions to the NemoVideo cloud backend, and treat the NEMO_TOKEN as a credential. Avoid sensitive personal, business, or copyrighted media unless the publisher clarifies data handling, retention, and the exact actions the skill may trigger.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (6)

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The skill is presented as a transcript generator, but the instructions implement a much broader cloud video editing and rendering workflow, including session management, uploads, state inspection, and export of rendered MP4s. This mismatch is dangerous because it can cause users and the host agent to grant access to media and invoke capabilities far beyond the advertised purpose, increasing the chance of unintended data transfer and abuse.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: The documented endpoints and workflows support generalized media editing, timeline manipulation, and video rendering rather than narrowly scoped transcription. For a skill marketed as transcription, these extra capabilities materially expand the attack surface and enable processing or exporting user media in ways the user may not expect.

Intent-Code Divergence

Medium

Confidence: 93% confidence
Finding: The documentation inconsistently claims the skill generates a text transcript while also promising downloadable 1080p MP4 output. This inconsistency can mislead users about what data is processed and what artifacts are produced, undermining informed consent and making it easier to hide broader media-processing behavior.

Vague Triggers

High

Confidence: 96% confidence
Finding: Routing 'Everything else' to the SSE action creates an extremely broad trigger that can capture unrelated user requests and send them to the external backend. In this context, that is especially risky because the backend is capable of acting on media sessions and cloud processing, so accidental invocation can expose user content or instructions off-platform.

Vague Triggers

Medium

Confidence: 91% confidence
Finding: The suggested invocation phrase 'Or just tell me what you're thinking' is overly generic and likely to overlap with ordinary conversation, increasing the chance that the skill activates unintentionally. Because activation leads to backend connection and possible media-processing flows, accidental triggering carries real privacy and integrity risk.

Missing User Warnings

High

Confidence: 95% confidence
Finding: The skill does not clearly warn users up front that uploaded video files and prompts are sent to a third-party cloud backend for processing. Given that videos may contain sensitive personal, corporate, or copyrighted material, lack of explicit disclosure meaningfully increases the privacy and data-handling risk.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal