Video To Text Converter
PassAudited by VirusTotal on May 4, 2026.
Overview
Type: OpenClaw Skill Name: video-to-text-converter Version: 1.0.0 The skill is a functional wrapper for a video transcription and editing service hosted at nemovideo.ai. It provides clear instructions for the AI agent to manage sessions, handle file uploads, and poll for transcription results via a documented API. There is no evidence of data exfiltration, malicious execution, or deceptive prompt injection; the automated token generation and session management are consistent with the stated purpose of providing a seamless 'Video to Text' utility.
Findings (0)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Private videos, audio, and related metadata may be uploaded to nemovideo.ai when the skill is used.
The skill is designed to send user-provided media to an external cloud backend for processing.
"Drop your video files in the chat... I'll handle the speech-to-text transcription on cloud GPUs"; "Upload: POST `/api/upload-video/nemo_agent/me/<sid>` — file: multipart `-F \"files=@/path\"`"
Use it only for media you are comfortable processing through that service, and check the provider's privacy and retention terms before uploading sensitive recordings.
Anyone with the token could potentially use the associated Nemo session or credits until it expires or is rotated.
The skill relies on a bearer token for the Nemo backend. This is expected for the integration and the artifact tells the agent not to display token values.
"Check if `NEMO_TOKEN` is set in the environment"; "The response `data.token` is your NEMO_TOKEN"; "Include `Authorization: Bearer <NEMO_TOKEN>`"
Keep NEMO_TOKEN out of chat messages and logs, and rotate or remove it if you believe it was exposed.
The agent may contact the external service and continue workflow API calls as part of normal use.
The skill allows automatic backend setup and follow-on API calls based on backend workflow messages. This is purpose-aligned, but it increases the importance of keeping actions scoped to the user's request.
"When a user first opens this skill, connect to the processing backend automatically"; "The backend responds as if there's a visual interface. Map its instructions to API calls"
Before uploading or exporting, confirm the requested operation and avoid giving broad instructions if you only want a transcript.
A user expecting only transcript extraction may not realize the workflow can involve broader cloud video editing/rendering behavior.
The artifact presents itself as a video-to-text converter but also documents MP4 rendering and broader video editing/generation actions.
"Get text transcripts ready to post" and "download 1080p MP4 when it's done"; "Everything else (generate, edit, add BGM…) | → §3.1 SSE"
Review the skill behavior before use and phrase requests narrowly if you only want transcription.
