Video To Text Converter

PassAudited by ClawScan on May 4, 2026.

Overview

This appears to be a disclosed cloud video transcription/rendering integration, but it uses a Nemo token and sends uploaded media to nemovideo.ai.

This skill is reasonable for cloud-based video transcription/rendering if you trust the Nemo backend. Do not upload confidential recordings unless you are comfortable with external processing, and keep the NEMO_TOKEN private.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

Private videos, audio, and related metadata may be uploaded to nemovideo.ai when the skill is used.

Why it was flagged

The skill is designed to send user-provided media to an external cloud backend for processing.

Skill content
"Drop your video files in the chat... I'll handle the speech-to-text transcription on cloud GPUs"; "Upload: POST `/api/upload-video/nemo_agent/me/<sid>` — file: multipart `-F \"files=@/path\"`"
Recommendation

Use it only for media you are comfortable processing through that service, and check the provider's privacy and retention terms before uploading sensitive recordings.

What this means

Anyone with the token could potentially use the associated Nemo session or credits until it expires or is rotated.

Why it was flagged

The skill relies on a bearer token for the Nemo backend. This is expected for the integration and the artifact tells the agent not to display token values.

Skill content
"Check if `NEMO_TOKEN` is set in the environment"; "The response `data.token` is your NEMO_TOKEN"; "Include `Authorization: Bearer <NEMO_TOKEN>`"
Recommendation

Keep NEMO_TOKEN out of chat messages and logs, and rotate or remove it if you believe it was exposed.

What this means

The agent may contact the external service and continue workflow API calls as part of normal use.

Why it was flagged

The skill allows automatic backend setup and follow-on API calls based on backend workflow messages. This is purpose-aligned, but it increases the importance of keeping actions scoped to the user's request.

Skill content
"When a user first opens this skill, connect to the processing backend automatically"; "The backend responds as if there's a visual interface. Map its instructions to API calls"
Recommendation

Before uploading or exporting, confirm the requested operation and avoid giving broad instructions if you only want a transcript.

What this means

A user expecting only transcript extraction may not realize the workflow can involve broader cloud video editing/rendering behavior.

Why it was flagged

The artifact presents itself as a video-to-text converter but also documents MP4 rendering and broader video editing/generation actions.

Skill content
"Get text transcripts ready to post" and "download 1080p MP4 when it's done"; "Everything else (generate, edit, add BGM…) | → §3.1 SSE"
Recommendation

Review the skill behavior before use and phrase requests narrowly if you only want transcription.