Video To Text Online Free

Security checks across malware telemetry and agentic risk

Overview

This is a cloud media-processing skill marketed as video transcription, but its instructions also enable broad remote video editing, rendering, and catch-all prompt forwarding.

Review before installing. Treat this as a broader cloud video editing and rendering connector, not just a local or narrow transcription helper. Use it only for media you are comfortable uploading to nemovideo.ai, and avoid sensitive recordings unless the provider’s privacy, retention, billing, and export behavior are acceptable.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (6)

Description-Behavior Mismatch

High
Confidence
97% confidence
Finding
The skill is presented as a simple video-to-text transcription tool, but the instructions reveal a much broader cloud video editing and rendering capability with session management, timeline state, uploads, and export flows. This mismatch is dangerous because it can cause users and host platforms to grant access under a low-risk mental model while the skill actually enables higher-risk remote media processing and output generation through a third-party service.

Context-Inappropriate Capability

High
Confidence
96% confidence
Finding
The skill exposes general-purpose media editing features such as overlays, audio tracks, draft timelines, SSE-driven edit commands, and MP4 export, which go far beyond the stated transcript-generation purpose. In a skill marketed for transcription, these hidden capabilities materially expand what the agent can do with uploaded user media and increase the chance of misuse, unauthorized transformations, or policy evasion.

Intent-Code Divergence

Medium
Confidence
93% confidence
Finding
The title, examples, and description frame the skill as producing text transcripts, but the operational guidance emphasizes generating and exporting 1080p MP4 outputs. This inconsistency is risky because it misleads users about the nature of processing and outputs, reducing informed consent and making it easier to hide broader remote media manipulation behind a benign transcription label.

Vague Triggers

Medium
Confidence
88% confidence
Finding
The trigger phrase "Or just tell me what you're thinking" is overly open-ended and can cause the skill to activate on unrelated user conversations. Overbroad activation is dangerous here because the skill can initiate third-party cloud setup, session creation, and media-processing flows without sufficiently specific user intent.

Vague Triggers

Medium
Confidence
95% confidence
Finding
The routing table sends "Everything else" to the SSE action, creating a catch-all path for arbitrary prompts. Because SSE appears to relay broad edit/generation instructions to a remote backend, this weak trigger boundary substantially increases the chance of unintended activation and misuse beyond the declared transcription purpose.

Missing User Warnings

Medium
Confidence
98% confidence
Finding
The skill instructs the agent to upload files and user messages to a third-party cloud backend, but the user-facing description does not clearly warn that media, prompts, and session data leave the local environment. This is a privacy and consent issue, especially for potentially sensitive video content, because users may believe the tool is a simple local transcription helper rather than a remote processing pipeline.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal