Video To Text Gratis

Security checks across malware telemetry and agentic risk

Overview

This skill is marketed as video transcription, but its instructions enable broad third-party cloud media upload, editing, rendering, export, token creation, and session use.

Review before installing. Use this only if you are comfortable sending media files, prompts, session state, and render/export requests to mega-api-prod.nemovideo.ai. Avoid confidential, private, or regulated media unless you independently trust that service and understand that this skill may perform broader cloud media editing and MP4 export, not just text transcription.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (6)

Description-Behavior Mismatch

High
Confidence
97% confidence
Finding
The skill presents itself as a simple video-to-text transcription tool, but the documented behavior is a much broader remote media editing and rendering workflow that returns MP4 video output rather than primarily text. This mismatch is dangerous because it can cause users and host systems to grant access under false assumptions, enabling unexpected remote processing, file handling, and downstream actions outside the declared scope.

Context-Inappropriate Capability

Medium
Confidence
93% confidence
Finding
The skill automatically acquires anonymous tokens and creates remote sessions even though the advertised use case is simple transcription. This expands the trust boundary, creates hidden external accounts/sessions, and enables remote service access that users may not expect or consent to, increasing privacy and abuse risk.

Description-Behavior Mismatch

Medium
Confidence
94% confidence
Finding
The documented supported formats include images and standalone audio in addition to video, which exceeds the declared scope of transcribing video files. Scope expansion like this increases the chance of unintended invocation, over-collection of user data, and use of the skill for broader media processing than users were told to expect.

Intent-Code Divergence

High
Confidence
98% confidence
Finding
The description and examples promise transcript generation, but the operational instructions emphasize returning a 1080p MP4 export. This is a substantive output mismatch that can mislead users about what processing occurs and what data leaves the environment, especially when video rendering implies richer cloud-side manipulation than plain text extraction.

Vague Triggers

Medium
Confidence
84% confidence
Finding
The routing rules are broad enough to capture generic video-related requests such as export, upload, status, or editing intents, not just transcription requests. Overbroad invocation can cause the skill to activate in contexts users did not intend, leading to unintended file uploads, remote processing, or execution of unrelated editing actions.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
The skill sends user video content to a remote cloud backend but does not prominently warn users of this in the main description or obtain explicit consent at the point of upload. This is dangerous because users may expose sensitive or personal media under the mistaken belief that processing is local or limited to simple transcription.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal