Video To Text Gratis

Security checks across malware telemetry and agentic risk

Overview

This skill is marketed as video transcription, but its instructions enable broad third-party cloud media upload, editing, rendering, export, token creation, and session use.

Review before installing. Use this only if you are comfortable sending media files, prompts, session state, and render/export requests to mega-api-prod.nemovideo.ai. Avoid confidential, private, or regulated media unless you independently trust that service and understand that this skill may perform broader cloud media editing and MP4 export, not just text transcription.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (6)

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: The skill presents itself as a simple video-to-text transcription tool, but the documented behavior is a much broader remote media editing and rendering workflow that returns MP4 video output rather than primarily text. This mismatch is dangerous because it can cause users and host systems to grant access under false assumptions, enabling unexpected remote processing, file handling, and downstream actions outside the declared scope.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: The skill automatically acquires anonymous tokens and creates remote sessions even though the advertised use case is simple transcription. This expands the trust boundary, creates hidden external accounts/sessions, and enables remote service access that users may not expect or consent to, increasing privacy and abuse risk.

Description-Behavior Mismatch

Medium

Confidence: 94% confidence
Finding: The documented supported formats include images and standalone audio in addition to video, which exceeds the declared scope of transcribing video files. Scope expansion like this increases the chance of unintended invocation, over-collection of user data, and use of the skill for broader media processing than users were told to expect.

Intent-Code Divergence

High

Confidence: 98% confidence
Finding: The description and examples promise transcript generation, but the operational instructions emphasize returning a 1080p MP4 export. This is a substantive output mismatch that can mislead users about what processing occurs and what data leaves the environment, especially when video rendering implies richer cloud-side manipulation than plain text extraction.

Vague Triggers

Medium

Confidence: 84% confidence
Finding: The routing rules are broad enough to capture generic video-related requests such as export, upload, status, or editing intents, not just transcription requests. Overbroad invocation can cause the skill to activate in contexts users did not intend, leading to unintended file uploads, remote processing, or execution of unrelated editing actions.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The skill sends user video content to a remote cloud backend but does not prominently warn users of this in the main description or obtain explicit consent at the point of upload. This is dangerous because users may expose sensitive or personal media under the mistaken belief that processing is local or limited to simple transcription.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal