Video To Text Ai Free

Security checks across malware telemetry and agentic risk

Overview

This skill appears to be a real cloud video-processing integration, but it does more than simple transcription and may send media or broad edit commands to a third-party backend.

Install only if you are comfortable with a third-party cloud video editor, not just a local transcript extractor. Avoid confidential recordings unless you trust NemoVideo, protect the NEMO_TOKEN, and explicitly confirm uploads, URL ingestion, edits, and exports before proceeding.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (7)

Description-Behavior Mismatch

High

Confidence: 96% confidence
Finding: The skill is presented as a simple video-to-text transcription tool, but the instructions expose a much broader remote video editing and rendering pipeline, including timeline manipulation, overlays, audio/BGM changes, and export operations. This scope mismatch is dangerous because it expands the agent's effective permissions and data flows beyond what a user would reasonably expect, increasing the chance of unauthorized remote processing or misuse of uploaded media.

Description-Behavior Mismatch

Medium

Confidence: 93% confidence
Finding: The skill promises transcript export, but the documented result is a rendered 1080p MP4 video, which is materially different from a text transcript. This can mislead users about what data will be produced, stored, and transmitted, and may cause them to send media under false assumptions about the processing purpose.

Context-Inappropriate Capability

Medium

Confidence: 89% confidence
Finding: Allowing URL-based ingestion introduces a broader attack and privacy surface than user-uploaded local files alone, including server-side fetching of arbitrary remote content. For a transcription skill, this capability is not clearly necessary and could be abused to pull in unintended or sensitive resources, depending on backend protections.

Intent-Code Divergence

Medium

Confidence: 91% confidence
Finding: The documentation frames the skill as transcription-focused, but later authorizes general editing behaviors such as adding BGM and manipulating a timeline. This inconsistency weakens user consent and makes it easier for the agent to perform broader cloud actions than the user intended when invoking a seemingly narrow transcription tool.

Vague Triggers

Medium

Confidence: 88% confidence
Finding: Routing 'everything else' into the SSE edit path creates an overly broad catch-all trigger that can capture unrelated user requests and send them to a remote backend. In the context of a skill with hidden broader capabilities, this increases the likelihood of unintended activation, data disclosure, and actions outside the user's expectations.

Vague Triggers

Medium

Confidence: 86% confidence
Finding: Generic trigger phrases such as 'convert my video files' or partial everyday language can collide with normal conversation and accidentally activate the skill. Because activation may lead to cloud processing and session creation, ambiguous triggers create privacy and consent risks disproportionate to the narrow advertised purpose.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The user-facing description emphasizes convenience but does not clearly warn that uploaded media, prompts, and session data are sent to a third-party cloud backend for processing. For a media-processing skill handling potentially sensitive audio/video, this is a significant transparency and privacy failure that can undermine informed consent.

VirusTotal

62/62 vendors flagged this skill as clean.

View on VirusTotal