Ai Voice Over Generator

v1.0.0

Skip the learning curve of professional editing software. Describe what you want — generate a natural-sounding voiceover in English for this silent video — a...

⭐ 0· 44·0 current·0 all-time

by@bwbernardweston18

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

Name/description align with the runtime instructions: the SKILL.md describes uploading video, creating sessions, SSE, render/export endpoints and headers for a nemo video rendering service. However the registry metadata (required config paths: none) conflicts with the SKILL.md frontmatter that lists a config path (~/.config/nemovideo/). The skill's source/homepage is unknown which reduces provenance and increases risk.

Instruction Scope

Instructions tell the agent to talk to https://mega-api-prod.nemovideo.ai, create an anonymous token if NEMO_TOKEN is not set, and store session_id for subsequent requests. They also explicitly say 'Don't display raw API responses or token values to the user.' Automatically obtaining and then hiding tokens/session state is functionally plausible but increases the chance of unnoticed network activity or credential persistence; the instruction to hide tokens is a notable scope/privacy concern.

✓

Install Mechanism

No install spec and no code files — instruction-only. Lowest install risk: nothing is written to disk by an installer. The runtime does call remote APIs but there is no download/install step to review.

ℹ

Credentials

Only one credential is declared (NEMO_TOKEN) which is appropriate for a third-party API. But the SKILL.md auto-generates and uses an anonymous token when NEMO_TOKEN is absent, so requiring the env var is inconsistent with the runtime instructions. The frontmatter also references a config path (~/.config/nemovideo/) which is not declared elsewhere — this mismatch should be clarified.

✓

Persistence & Privilege

always is false and the skill doesn't request system-wide changes. It instructs storing a session_id for subsequent API calls (normal). There is no evidence it modifies other skills or global agent config.

What to consider before installing

This skill appears to implement a legitimate voice-over service, but the package lacks provenance (no source/homepage) and the SKILL.md instructs the agent to automatically obtain and then hide an anonymous token if none is set. Before installing: 1) Decide whether you trust https://mega-api-prod.nemovideo.ai and the unknown skill author; 2) If you prefer visibility, set your own NEMO_TOKEN manually instead of letting the skill auto-generate it; 3) Be aware uploads (videos/scripts) go to a third-party backend—do not send sensitive content you don't want stored/processed off-host; 4) Ask the publisher to reconcile the metadata mismatch about config paths and to explain where tokens/sessions are persisted (in-memory only vs written to disk). If you cannot verify the service or author, treat it as untrusted and avoid using it with sensitive material.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🎙️ Clawdis

EnvNEMO_TOKEN

Primary envNEMO_TOKEN

latestvk9704basy6p3f6yyf7fa0yh46984p7sp

44downloads

0stars

1versions

Updated 6d ago

v1.0.0

MIT-0

Getting Started

Send me your video or script and I'll handle the AI voiceover generation. Or just describe what you're after.

Try saying:

"generate a 2-minute product demo video with no audio into a 1080p MP4"
"generate a natural-sounding voiceover in English for this silent video"
"adding AI-generated voiceovers to silent or muted videos for content creators, marketers, educators"

First-Time Connection

When a user first opens this skill, connect to the processing backend automatically. Briefly let them know (e.g. "Setting up...").

Authentication: Check if NEMO_TOKEN is set in the environment. If it is, skip to step 2.

Obtain a free token: Generate a random UUID as client identifier. POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id set to that UUID. The response data.token is your NEMO_TOKEN — 100 free credits, valid 7 days.
Create a session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Authorization: Bearer <token>, Content-Type: application/json, and body {"task_name":"project","language":"<detected>"}. Store the returned session_id for all subsequent requests.

Keep setup communication brief. Don't display raw API responses or token values to the user.

AI Voice Over Generator — Generate Voiceovers for Any Video

Send me your video or script and describe the result you want. The AI voiceover generation runs on remote GPU nodes — nothing to install on your machine.

A quick example: upload a 2-minute product demo video with no audio, type "generate a natural-sounding voiceover in English for this silent video", and you'll get a 1080p MP4 back in roughly 30-60 seconds. All rendering happens server-side.

Worth noting: shorter scripts produce more natural-sounding results — keep sentences under 20 words.

Matching Input to Actions

User prompts referencing ai voice over generator, aspect ratio, text overlays, or audio tracks get routed to the corresponding action via keyword and intent classification.

User says...	Action	Skip SSE?
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"	→ §3.3 Credits	✅
"status" / "状态" / "show tracks"	→ §3.4 State	✅
"upload" / "上传" / user sends file	→ §3.2 Upload	✅
Everything else (generate, edit, add BGM…)	→ §3.1 SSE	❌

Cloud Render Pipeline Details

Each export job queues on a cloud GPU node that composites video layers, applies platform-spec compression (H.264, up to 1080x1920), and returns a download URL within 30-90 seconds. The session token carries render job IDs, so closing the tab before completion orphans the job.

Three attribution headers are required on every request and must match this file's frontmatter:

Header	Value
`X-Skill-Source`	`ai-voice-over-generator`
`X-Skill-Version`	frontmatter `version`
`X-Skill-Platform`	auto-detect: `clawhub` / `cursor` / `unknown` from install path

All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.

API base: https://mega-api-prod.nemovideo.ai

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: {"urls":["<url>"],"source_type":"url"}

Credits: GET /api/credits/balance/simple — returns available, frozen, total

Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, data.state.generated_media

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

SSE Event Handling

Event	Action
Text response	Apply GUI translation (§4), present to user
Tool call/result	Process internally, don't forward
`heartbeat` / empty `data:`	Keep waiting. Every 2 min: "⏳ Still working..."
Stream closes	Process final response

~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Backend Response Translation

The backend assumes a GUI exists. Translate these into API actions:

Backend says	You do
"click [button]" / "点击"	Execute via API
"open [panel]" / "打开"	Query session state
"drag/drop" / "拖拽"	Send edit via SSE
"preview in timeline"	Show track summary
"Export button" / "导出"	Execute export workflow

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Error Codes

0 — success, continue normally
1001 — token expired or invalid; re-acquire via /api/auth/anonymous-token
1002 — session not found; create a new one
2001 — out of credits; anonymous users get a registration link with ?bind=<id>, registered users top up
4001 — unsupported file type; show accepted formats
4002 — file too large; suggest compressing or trimming
400 — missing X-Client-Id; generate one and retry
402 — free plan export blocked; not a credit issue, subscription tier
429 — rate limited; wait 30s and retry once

Common Workflows

Quick edit: Upload → "generate a natural-sounding voiceover in English for this silent video" → Download MP4. Takes 30-60 seconds for a 30-second clip.

Batch style: Upload multiple files in one session. Process them one by one with different instructions. Each gets its own render.

Iterative: Start with a rough cut, preview the result, then refine. The session keeps your timeline state so you can keep tweaking.

Tips and Tricks

The backend processes faster when you're specific. Instead of "make it look better", try "generate a natural-sounding voiceover in English for this silent video" — concrete instructions get better results.

Max file size is 500MB. Stick to MP4, MOV, AVI, WebM for the smoothest experience.

Export as MP4 for widest compatibility across platforms and devices.

Comments

Loading comments...