Video To Text Converter Free

Security checks across malware telemetry and agentic risk

Overview

This skill is marketed as video-to-text transcription, but its instructions also drive a broad remote video editing and rendering service that may upload media and prompts to a third-party backend.

Install only if you intend to use a cloud NemoVideo-style media service, trust that provider with your videos and prompts, and are comfortable with broader editing/rendering actions beyond transcription. Avoid using it for private, regulated, or confidential recordings unless the publisher clarifies output type, consent for non-transcription actions, and data retention/deletion practices.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (6)

Description-Behavior Mismatch

High
Confidence
97% confidence
Finding
The manifest presents this as a simple video-to-text transcription skill, but the body documents a much broader remote video editing, session management, upload, SSE command, and rendering pipeline. This mismatch is dangerous because users and host platforms may grant trust, files, and tokens under a narrow transcription expectation while the skill can route arbitrary editing/export actions to a third-party backend.

Context-Inappropriate Capability

Medium
Confidence
94% confidence
Finding
The documented actions include export, state inspection, SSE-driven edits, and remote render workflows that exceed what is necessary to transcribe speech from uploaded videos. Excess capability increases attack surface and creates a confused-deputy risk where a user invoking transcription may unintentionally trigger remote media processing and broader data disclosure to the backend.

Intent-Code Divergence

Medium
Confidence
92% confidence
Finding
The documentation promises transcribed text but repeatedly describes the output as a 1080p MP4 video, creating ambiguity about what processing actually occurs and what artifacts are produced. This can mislead users into sharing sensitive recordings under false assumptions and may cause them to receive transformed media rather than the expected transcript.

Vague Triggers

Medium
Confidence
88% confidence
Finding
The activation example 'Or just tell me what you're thinking' is so broad that ordinary conversation could be interpreted as a request to invoke the skill. In the context of a skill that automatically connects to a remote backend and may upload/process user media, overbroad triggering raises the risk of accidental activation and unintended data transfer.

Vague Triggers

Medium
Confidence
95% confidence
Finding
The routing table sends 'Everything else' to the SSE action, effectively creating an unbounded catch-all path for arbitrary prompts. In a skill backed by a remote service that can perform edits, stateful actions, and exports, this makes behavior hard to predict and can cause unrelated user input to be transmitted to a third-party backend for unintended processing.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
The skill description does not clearly warn users up front that uploaded videos, prompts, and session data are sent to a remote backend for cloud processing. For a tool likely to handle sensitive recordings, insufficient disclosure undermines informed consent and increases privacy and compliance risk.

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal