Transcribe Video To Text

ReviewAudited by ClawScan on May 10, 2026.

Overview

The skill is advertised as video-to-text transcription, but its instructions also enable broad cloud video editing, rendering, upload, and export workflows that users may not expect.

Review this skill before installing if you only want text transcription. It appears to be a broader NemoVideo cloud editing/rendering integration, so confirm you are comfortable uploading videos to nemovideo.ai, using a NEMO_TOKEN or starter token, and potentially triggering edit/export workflows or credit usage.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Concern

ASI09: Human-Agent Trust Exploitation

What this means

A user expecting only a text transcript could unknowingly enter a broader video rendering or editing workflow.

Why it was flagged

The artifact presents the skill as text transcription but also tells the agent to deliver rendered MP4 video output, creating a material mismatch in user expectations.

Skill content

"transcribe the spoken dialogue into a text document" ... "you've got a MP4 file ready to download. The whole thing runs at 1080p by default."

Recommendation

Align the description and instructions with the actual behavior, and clearly disclose when the skill will render, edit, export, or produce video rather than text.

Concern

ASI02: Tool Misuse and Exploitation

What this means

The agent could consume credits, modify session state, upload media, or trigger exports in ways that are not clearly limited to transcription.

Why it was flagged

The instructions give the agent broad authority to run backend video editing/rendering workflows and translate backend UI-like messages into API actions, beyond the narrow transcription purpose.

Skill content

"Everything else (generate, edit, add BGM…) | → §3.1 SSE" and "Backend says ... 'click [button]' ... Execute via API"

Recommendation

Limit the skill to transcription-specific actions, and require explicit user confirmation before uploads, edits, renders, exports, or credit-consuming operations.

Note

ASI03: Identity and Privilege Abuse

What this means

The agent can authenticate to NemoVideo using the configured token or an automatically acquired starter token.

Why it was flagged

The skill requires a service credential for NemoVideo API access; this is expected for a cloud integration but gives the agent delegated authority for that service.

Skill content

Required env vars: NEMO_TOKEN ... Primary credential: NEMO_TOKEN

Recommendation

Use a token intended only for this service, monitor credit usage, and revoke or rotate the token if you no longer use the skill.

Note

ASI07: Insecure Inter-Agent Communication

What this means

Uploaded videos and spoken content leave the local environment for cloud processing.

Why it was flagged

The skill sends user media and session requests to an external cloud provider, which is purpose-aligned but sensitive because videos may contain private audio or images.

Skill content

"All calls go to `https://mega-api-prod.nemovideo.ai`" and "Upload — `POST /api/upload-video/nemo_agent/me/<sid>`"

Recommendation

Upload only files you are comfortable sending to NemoVideo, and review the provider’s privacy and retention terms before using sensitive media.