Install
openclaw skills install @pruna-ai/music-videoUse when the user wants a music video, lyric video, sung promo, or original song paired with performance and B-roll clips.
openclaw skills install @pruna-ai/music-videoEnd-to-end music video production:
REPLICATE_API_TOKEN)parse_lyric_cuts.py → one clip per lyric line / sectionalign_lyric_cuts.py → word-level start_sec / end_sec on the rendered songStaged generation: staged-generation-gate.md — approve lyrics and stills before paid video jobs.
| Resource | Path |
|---|---|
| Lyrics, cuts, align pipeline | lyrics-and-cuts.md |
| Runner | run_from_plan.py · --phase song|align|stills|video|assemble |
| Plan template | music-video-plan.template.json |
| Feedback | requesting-generation-feedback |
| QA | music-video-quality-checklist.md |
| Beat | Human singer / rapper | Mascot or stylized host |
|---|---|---|
| Performance (lip sync to song) | p-video-avatar — image + audio slice from master song. Not voice_script. | p-video — image + audio slice (Pruna music-to-video). p-video-avatar humanizes non-human stills — avoid on mascots. |
| B-roll | p-video — still + audio slice (or duration on instrumentals) | Same |
Set in the plan: cast.host_type (human | mascot) and optional cast.performance_model override. The runner (run_from_plan.py) picks the model from beat_type + host_type.
Reference shipped video: output/verticals/music-video/purple-pruna-rap/ — mascot battle rap, p-video performance + B-roll, audio-conditioned slices → purple_pruna_rap.mp4.
Human rapper pattern: cast.host_type: human → performance sections use p-video-avatar + song slice; B-roll stays p-video.
| Topic | Questions |
|---|---|
| Genre / mood | Indie pop, R&B, electronic, acoustic ballad? Energy arc? |
| Vocal | Gender, timbre, tempo (BPM), key instruments — becomes music.prompt |
| Story | What should the video show during verse vs chorus vs instrumental? |
| Cast | One singer throughout or stylistic recasts on B-roll only? If same singer, confirm before stills — see Character continuity below. |
| Continuity | Same face/wardrobe baseline across performance cuts, or deliberate variety (location changes OK; identity drift is not)? |
| Format | 16:9 / 9:16, 720p / 1080p |
| Length | Short hook (~60s) or full song (~3 min)? Fewer cuts = lower cost |
| Cut density | Line-per-cut (pop) or cut_granularity: section (one clip per verse — rap battles)? |
| Beat mix | Performance-heavy vs B-roll-heavy? Default: alternate on verses, performance on chorus |
Do not call Music 2.5 or Pruna video until lyrics are approved.
Ask whether performance beats should read as one singer or whether recasts are deliberate. Default assumption when the user names a single artist: same person on every performance cut.
| Intent | Stills | Video | Anti-pattern |
|---|---|---|---|
| Same singer throughout | One approved hero via p-image (locked project_seed) → every performance still via p-image-edit off that URL — change only angle, setting, expression, wardrobe delta | Pass seed: project_seed on all p-video-avatar jobs; reuse cast_descriptor in edit prompts | Fresh unrelated p-image text prompt per line — faces drift |
| Same singer, new locations | Hero + edits per beat — vary setting_tag, camera_tag, lighting_tag; keep identity anchors (age, hair, face, baseline outfit) in the character sheet | Same seed lock; distinct video_prompt per cut | Grey-wall repeat or identical framing on consecutive performance lines |
| Deliberate recasts | Only on broll beats, labeled guest rows, or when the user explicitly asks — never silent identity swaps on back-to-back performance lines | N/A for lip-sync rows | Random new face mid-chorus without user approval |
| Mascot / stylized host | One approved mascot still → p-image-edit for pose/setting | p-video scene anchor triple: image + optional last_frame_image + song audio slice | p-video-avatar on non-human stills |
Record in the plan: project_seed, cast / character_sheet, approved hero_still URL, and continuity: same_singer | recasts_ok. Full cast-ledger patterns: multi-scene-avatar-video Character sheet and Source portrait / hero.
| Phase | Models | Cost | Gate |
|---|---|---|---|
| 0 — Lyrics | none | free | User approves lyric sheet + section tags |
| A — Song | music-2.5 | medium | User approves MP3 |
| B — Cut structure | local scripts | free | Cut list matches lyric lines |
| B2 — Cut timings | whisperx | low | Review cut_manifest.json alignment stats |
| C — Stills | p-image / p-image-edit | low | Per staged-generation-gate.md |
| D — Clips | p-video-avatar, p-video | high | After still approval (--approve-stills) |
| E — Assembly | ffmpeg | free | After clip approval (--approve-clips) |
Default runner --phase song. Phased flow:
python3 catalog/workflows/verticals/music-video/scripts/run_from_plan.py --plan PLAN --out-dir OUT --phase song
python3 ... --approve-song --phase align
python3 ... --phase stills
python3 ... --approve-stills --phase video
python3 ... --approve-clips --phase assemble
Index: workflow-feedback-gates.md
Lyrics + music.prompt → song → align → stills → video clips → assemble_music_video.mp4
Full lyric format, cut rules, align commands, and cut-manifest fields: lyrics-and-cuts.md.
p-image / p-image-edit)One approved still per segment.
When continuity is intended (default for one singer):
p-image + locked project_seed.hero_still in the plan.p-image-edit from hero_still — "Using attached reference as identity; change only: [angle], [setting], [expression]."Performance still rules (hero and edits):
setting_tag per chorus pass — loft, rooftop, neon corridor — without reinventing the faceB-roll stills: environment, hands, product, abstract motion plate for I2V — no identity requirement unless the B-roll shows the singer.
Run music-video-quality-checklist.md before Phase D.
Human host (cast.host_type: human): p-video-avatar + input.audio — true talking-head lip sync.
Mascot / stylized host (cast.host_type: mascot): p-video + input.image + input.audio — matches Pruna's music-video guide. p-video-avatar humanizes non-human stills into generic avatars; avoid it for knitted mascots, fox presenters, etc.
Override with cast.performance_model: p-video-avatar | p-video when needed.
python3 catalog/workflows/verticals/music-video/scripts/run_from_plan.py \
--plan output/my-mv/music_video_plan.json \
--out-dir output/my-mv \
--phase video --only 01_2 01_3
The runner calls slice_audio.py with start_sec / end_sec from the cut manifest (identical to alignment.audio_slice_*).
| Field | Guidance |
|---|---|
image | Approved performance still |
audio | Sliced line/section from master song — omit duration |
save_audio | true — embed vocal in clip (required for audio-led cuts) |
video_prompt | Unique motion per cut — push-in, arc, handheld sway |
resolution | Match plan (default 720p; use 1080p when user asks for final delivery) |
seed | Lock for same singer across performance clips |
p-video)Prefer audio-conditioned mode — upload the same slice, motion follows length:
{
"prompt": "Slow dolly through neon city street at dusk, rain reflections, cinematic",
"image": "https://api.pruna.ai/v1/files/STILL_ID",
"audio": "https://api.pruna.ai/v1/files/SLICE_ID",
"resolution": "720p",
"fps": 24,
"save_audio": true
}
Omit duration when audio is set. Runner: run_from_plan.py uses p_video_payload.py.
For [Inst] / [Solo] with no vocals, use duration from cut map instead of audio.
Parallelize independent clips after confirmation — parallel-execution.md.
Name clips to match cut ids (e.g. 01_2.mp4) or set "clip" on each cut in the manifest.
python3 catalog/workflows/verticals/music-video/scripts/assemble_music_video.py \
--plan output/my-mv/music_video_plan.json \
--cuts output/my-mv/cut_manifest.json \
--clips-dir output/my-mv/clips \
--song output/my-mv/song.mp3 \
--out-dir output/my-mv
Output: music_video.mp4 — video track from trimmed clips, full song on audio.
| Layer | Guidance |
|---|---|
| Color | Match music.prompt palette — warm ballad → golden hour; electronic → split gel neon |
| Identity | When continuity: same_singer, performance cuts should match hero face/outfit baseline — location and camera may change |
| Rhythm | Alternate performance and B-roll on verses; hold singer through chorus hooks |
| Camera | No duplicate video_prompt on back-to-back cuts |
| Instrumental breaks | Go cinematic — wide landscapes, abstract motion, detail macros |
| Variety | visual-variety-bible.md — distinct world per B-roll insert |
Copy templates/music-video-plan.template.json or see examples.md.
export REPLICATE_API_TOKEN=r8_... # music-2.5 + whisperx
export PRUNA_API_KEY=... # p-image, p-video-avatar, p-video
Requires ffmpeg and ffprobe.
parse_lyric_cuts.py timings without --phase align — lip sync will drift, especially on rapvoice_script on performance beats when the real song slice should drive lip syncp-image identity pull per performance line when the user wanted one singerhero_still + edit chain — biggest cause of face drift across a music videoalignment.failed rows when Music 2.5 paraphrased the lyrics| Resource | Path |
|---|---|
| Lyrics + cuts + align (steps 1–3) | lyrics-and-cuts.md |
| Feedback discipline | requesting-generation-feedback |
| Music 2.5 tool | music-2.5 |
| WhisperX STT | whisperx |
| Avatar API | p-video-avatar |
| Cinematic API | p-video |
| Scenario hub | pruna-generative-pipeline recipe O |
| QA | music-video-quality-checklist.md |