Video To Text Converter
PassAudited by ClawScan on May 4, 2026.
Overview
This appears to be a disclosed cloud video transcription/rendering integration, but it uses a Nemo token and sends uploaded media to nemovideo.ai.
This skill is reasonable for cloud-based video transcription/rendering if you trust the Nemo backend. Do not upload confidential recordings unless you are comfortable with external processing, and keep the NEMO_TOKEN private.
Findings (4)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Private videos, audio, and related metadata may be uploaded to nemovideo.ai when the skill is used.
The skill is designed to send user-provided media to an external cloud backend for processing.
"Drop your video files in the chat... I'll handle the speech-to-text transcription on cloud GPUs"; "Upload: POST `/api/upload-video/nemo_agent/me/<sid>` — file: multipart `-F \"files=@/path\"`"
Use it only for media you are comfortable processing through that service, and check the provider's privacy and retention terms before uploading sensitive recordings.
Anyone with the token could potentially use the associated Nemo session or credits until it expires or is rotated.
The skill relies on a bearer token for the Nemo backend. This is expected for the integration and the artifact tells the agent not to display token values.
"Check if `NEMO_TOKEN` is set in the environment"; "The response `data.token` is your NEMO_TOKEN"; "Include `Authorization: Bearer <NEMO_TOKEN>`"
Keep NEMO_TOKEN out of chat messages and logs, and rotate or remove it if you believe it was exposed.
The agent may contact the external service and continue workflow API calls as part of normal use.
The skill allows automatic backend setup and follow-on API calls based on backend workflow messages. This is purpose-aligned, but it increases the importance of keeping actions scoped to the user's request.
"When a user first opens this skill, connect to the processing backend automatically"; "The backend responds as if there's a visual interface. Map its instructions to API calls"
Before uploading or exporting, confirm the requested operation and avoid giving broad instructions if you only want a transcript.
A user expecting only transcript extraction may not realize the workflow can involve broader cloud video editing/rendering behavior.
The artifact presents itself as a video-to-text converter but also documents MP4 rendering and broader video editing/generation actions.
"Get text transcripts ready to post" and "download 1080p MP4 when it's done"; "Everything else (generate, edit, add BGM…) | → §3.1 SSE"
Review the skill behavior before use and phrase requests narrowly if you only want transcription.
