Ai Video Generator From Music

v1.0.0

generate audio files into music-synced videos with this skill. Works with MP3, WAV, AAC, FLAC files up to 200MB. musicians and content creators use it for ge...

⭐ 0· 46·0 current·0 all-time

by@dsewell-583h0

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for dsewell-583h0/ai-video-generator-from-music.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Ai Video Generator From Music" (dsewell-583h0/ai-video-generator-from-music) from ClawHub.
Skill page: https://clawhub.ai/dsewell-583h0/ai-video-generator-from-music
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required env vars: NEMO_TOKEN
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install ai-video-generator-from-music

ClawHub CLI

Package manager switcher

npx clawhub@latest install ai-video-generator-from-music

Security Scan

VirusTotal

Pending

View report →

OpenClaw

Benign

medium confidence

✓

Purpose & Capability

Name/description match the actions described in SKILL.md: creating music-synced videos on a cloud backend. Requesting a single service token (NEMO_TOKEN) is appropriate for an API-backed renderer. The skill's documented API endpoints and upload/render flows align with the declared purpose.

ℹ

Instruction Scope

Instructions direct the agent to check the environment for NEMO_TOKEN, optionally request an anonymous token from https://mega-api-prod.nemovideo.ai, create sessions, upload user audio, and poll SSE/render endpoints — all expected for a cloud render workflow. Two points to note: (1) examples reference local file paths for multipart uploads (expected when user supplies files, but the agent must only access user-provided files), and (2) the skill instructs detection of an install path to set an attribution header (X-Skill-Platform) which implies reading or inferring agent install paths — this is not strictly required for rendering and leaks some environment identification information.

✓

Install Mechanism

Instruction-only skill (no install spec, no code files). No packages or remote downloads are performed by the skill itself, which reduces supply-chain risk.

ℹ

Credentials

Only one credential is declared and used: NEMO_TOKEN (primary). This is proportionate to a hosted rendering service. The SKILL.md also contains frontmatter metadata that references a config path (~/.config/nemovideo/) and uses an install-path-based header; the registry metadata showed no required config paths — this mismatch should be clarified (does the skill attempt to read that config path or not?).

✓

Persistence & Privilege

always is false and the skill is user-invocable. It does not request persistent system-wide privileges, nor does it modify other skills' configurations. The skill will perform network calls to the remote API for each operation, which is normal for an API-backed renderer.

Assessment

This skill appears to do what it says — it uploads your audio to a cloud service (mega-api-prod.nemovideo.ai) and returns rendered videos. Before installing/using it, consider: 1) Privacy — your audio and any metadata are sent to the provider; confirm their data retention and sharing policies. 2) Token handling — NEMO_TOKEN grants the skill access to the service; if the skill generates an anonymous token, that token may be stored or used for 7 days; ask whether tokens are persisted and where. 3) Local file access — the skill will upload user-supplied files; ensure the agent only sends files you explicitly provide and does not read arbitrary local files. 4) The skill references an agent install path and a config directory (~/.config/nemovideo/) in its frontmatter — ask the author whether the skill will read those paths (the registry shows no required config path). 5) Verify the domain (nemovideo.ai) is the intended service and that you trust it to process your media. If you need higher assurance, request the publisher/source URL, a privacy/data-retention statement, and confirmation about whether any tokens or uploads are stored long-term.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🎵 Clawdis

EnvNEMO_TOKEN

Primary envNEMO_TOKEN

latestvk97bbn030xq8k371vevd3ywmex85kw7w

46downloads

0stars

1versions

Updated 2d ago

v1.0.0

MIT-0

Getting Started

Share your audio files and I'll get started on AI video creation. Or just tell me what you're thinking.

Try saying:

"generate my audio files"
"export 1080p MP4"
"generate a music video with visuals"

Quick Start Setup

This skill connects to a cloud processing backend. On first use, set up the connection automatically and let the user know ("Connecting...").

Token check: Look for NEMO_TOKEN in the environment. If found, skip to session creation. Otherwise:

Generate a UUID as client identifier
POST https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with X-Client-Id header
Extract data.token from the response — this is your NEMO_TOKEN (100 free credits, 7-day expiry)

Session: POST https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Keep the returned session_id for all operations.

Let the user know with a brief "Ready!" when setup is complete. Don't expose tokens or raw API output.

AI Video Generator from Music — Turn Music Into Synced Videos

Send me your audio files and describe the result you want. The AI video creation runs on remote GPU nodes — nothing to install on your machine.

A quick example: upload a 3-minute MP3 song track, type "generate a music video with visuals that match the beat and mood of this song", and you'll get a 1080p MP4 back in roughly 1-2 minutes. All rendering happens server-side.

Worth noting: shorter tracks under 2 minutes process faster and give tighter visual sync.

Matching Input to Actions

User prompts referencing ai video generator from music, aspect ratio, text overlays, or audio tracks get routed to the corresponding action via keyword and intent classification.

User says...	Action	Skip SSE?
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"	→ §3.3 Credits	✅
"status" / "状态" / "show tracks"	→ §3.4 State	✅
"upload" / "上传" / user sends file	→ §3.2 Upload	✅
Everything else (generate, edit, add BGM…)	→ §3.1 SSE	❌

Cloud Render Pipeline Details

Each export job queues on a cloud GPU node that composites video layers, applies platform-spec compression (H.264, up to 1080x1920), and returns a download URL within 30-90 seconds. The session token carries render job IDs, so closing the tab before completion orphans the job.

Headers are derived from this file's YAML frontmatter. X-Skill-Source is ai-video-generator-from-music, X-Skill-Version comes from the version field, and X-Skill-Platform is detected from the install path (~/.clawhub/ = clawhub, ~/.cursor/skills/ = cursor, otherwise unknown).

Every API call needs Authorization: Bearer <NEMO_TOKEN> plus the three attribution headers above. If any header is missing, exports return 402.

API base: https://mega-api-prod.nemovideo.ai

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: {"urls":["<url>"],"source_type":"url"}

Credits: GET /api/credits/balance/simple — returns available, frozen, total

Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, data.state.generated_media

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

Reading the SSE Stream

Text events go straight to the user (after GUI translation). Tool calls stay internal. Heartbeats and empty data: lines mean the backend is still working — show "⏳ Still working..." every 2 minutes.

About 30% of edit operations close the stream without any text. When that happens, poll /api/state to confirm the timeline changed, then tell the user what was updated.

Translating GUI Instructions

The backend responds as if there's a visual interface. Map its instructions to API calls:

"click" or "点击" → execute the action via the relevant endpoint
"open" or "打开" → query session state to get the data
"drag/drop" or "拖拽" → send the edit command through SSE
"preview in timeline" → show a text summary of current tracks
"Export" or "导出" → run the export workflow

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Error Handling

Code	Meaning	Action
0	Success	Continue
1001	Bad/expired token	Re-auth via anonymous-token (tokens expire after 7 days)
1002	Session not found	New session §3.0
2001	No credits	Anonymous: show registration URL with `?bind=<id>` (get `<id>` from create-session or state response when needed). Registered: "Top up credits in your account"
4001	Unsupported file	Show supported formats
4002	File too large	Suggest compress/trim
400	Missing X-Client-Id	Generate Client-Id and retry (see §1)
402	Free plan export blocked	Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export."
429	Rate limit (1 token/client/7 days)	Retry in 30s once

Tips and Tricks

The backend processes faster when you're specific. Instead of "make it look better", try "generate a music video with visuals that match the beat and mood of this song" — concrete instructions get better results.

Max file size is 200MB. Stick to MP3, WAV, AAC, FLAC for the smoothest experience.

Export as MP4 for widest compatibility across YouTube, Instagram, and TikTok.

Common Workflows

Quick edit: Upload → "generate a music video with visuals that match the beat and mood of this song" → Download MP4. Takes 1-2 minutes for a 30-second clip.

Batch style: Upload multiple files in one session. Process them one by one with different instructions. Each gets its own render.

Iterative: Start with a rough cut, preview the result, then refine. The session keeps your timeline state so you can keep tweaking.

Comments

Loading comments...