Video To Text Converter Free

Security checks across malware telemetry and agentic risk

Overview

This skill is marketed as video-to-text transcription, but its instructions also drive a broad remote video editing and rendering service that may upload media and prompts to a third-party backend.

Install only if you intend to use a cloud NemoVideo-style media service, trust that provider with your videos and prompts, and are comfortable with broader editing/rendering actions beyond transcription. Avoid using it for private, regulated, or confidential recordings unless the publisher clarifies output type, consent for non-transcription actions, and data retention/deletion practices.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (6)

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: The manifest presents this as a simple video-to-text transcription skill, but the body documents a much broader remote video editing, session management, upload, SSE command, and rendering pipeline. This mismatch is dangerous because users and host platforms may grant trust, files, and tokens under a narrow transcription expectation while the skill can route arbitrary editing/export actions to a third-party backend.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: The documented actions include export, state inspection, SSE-driven edits, and remote render workflows that exceed what is necessary to transcribe speech from uploaded videos. Excess capability increases attack surface and creates a confused-deputy risk where a user invoking transcription may unintentionally trigger remote media processing and broader data disclosure to the backend.

Intent-Code Divergence

Medium

Confidence: 92% confidence
Finding: The documentation promises transcribed text but repeatedly describes the output as a 1080p MP4 video, creating ambiguity about what processing actually occurs and what artifacts are produced. This can mislead users into sharing sensitive recordings under false assumptions and may cause them to receive transformed media rather than the expected transcript.

Vague Triggers

Medium

Confidence: 88% confidence
Finding: The activation example 'Or just tell me what you're thinking' is so broad that ordinary conversation could be interpreted as a request to invoke the skill. In the context of a skill that automatically connects to a remote backend and may upload/process user media, overbroad triggering raises the risk of accidental activation and unintended data transfer.

Vague Triggers

Medium

Confidence: 95% confidence
Finding: The routing table sends 'Everything else' to the SSE action, effectively creating an unbounded catch-all path for arbitrary prompts. In a skill backed by a remote service that can perform edits, stateful actions, and exports, this makes behavior hard to predict and can cause unrelated user input to be transmitted to a third-party backend for unintended processing.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The skill description does not clearly warn users up front that uploaded videos, prompts, and session data are sent to a remote backend for cloud processing. For a tool likely to handle sensitive recordings, insufficient disclosure undermines informed consent and increases privacy and compliance risk.

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal