Install
openclaw skills install @pruna-ai/narrated-multi-sceneUse when the user needs a multi-scene narrated film, episodic B-roll story, chaptered promo, or several linked video beats with voiceover and no in-scene character dialogue.
openclaw skills install @pruna-ai/narrated-multi-scenep-video only)Each scene = one p-video job (same model, separate predictions). Assembly is outside Pruna (ffmpeg or your editor). No p-video-avatar in this workflow.
See p-video (first/last frame chaining), scene-anchor-triple.md, scene-anchor-pair.md (visual-only alternative), audio-post-production.md, and pruna-api.md.
For visual transitions without narration, use visual-transition-reel instead.
For educational explainers (history, science, nature, how-it-works) with narrator + in-story character dialogue, use interactive-explainer instead of narrator-only tables.
Staged generation: staged-generation-gate.md · workflow-feedback-gates.md
| Phase | What to show | Proceed when |
|---|---|---|
| 0 — Plan | Scene table, narration lines, style_bible | approve plan |
| A — Stills | Hero + start/end stills per scene | approve stills |
| A2 — TTS | audio/narration_*.mp3 per scene — listen | Lines OK |
| B — Video | p-video clips with embedded VO | approve clips |
| D — Bed | Optional Stable Audio under concat | User accepts |
Execute phases manually or with phased curl — no bundled runner. Never batch p-video before still and TTS review.
Do not start scene 1 until the whole scene plan exists in writing (manifest or table):
| Topic | Questions |
|---|---|
| Story | Order of scenes (1…N)? What changes between scenes (location, time, emotion)? |
| Per scene i | Primary prompt? First frame (image), last frame (last_frame_image), narration (audio URL)? resolution / fps / draft? |
| Continuity | Per scene: chain_from_previous only when motion continues (same moment/location). Otherwise composed OPENING still + hard cut. End stills via p-image-edit; extract last frame when chaining. |
| Audio | Scene anchor triple (preferred): TTS → upload → p-video with image + last_frame_image + audio (omit duration; save_audio: true). Each scene line ≤ ~19s — P-API caps audio-led clips at 20s. Optional Stable Audio bed in post only. |
| Visual style | Locked style_bible? One specific subject/location per still (painterly illustration, period film still, etc.)? Avoid unrelated branding (e.g. science-show nebula) unless the brief asks for it |
| Global | Default aspect_ratio for text-only scenes? Global seed policy? |
| Runtime | Target total duration after assembly? |
| Assembly | Concat order; narration mux; bed mix volume (~0.08–0.15 under VO)? |
Ask follow-ups until every scene row has enough to build input without guessing.
# | Prompt | First frame (image) | Last frame (last_frame_image) | Narration (audio) | Mode |
|---|---|---|---|---|---|
| 1 | motion prompt | start still | end still → scene 2 | TTS line → upload | triple |
| 2 | = scene 1 end | end still → scene 3 | TTS line → upload | triple |
Mode: T2V · I2V · I2V+last · triple (image + last_frame_image + audio — omit duration)
p-image or upload.p-image-edit per scene — start still (edit_prompt) from hero; end still (last_frame_edit_prompt) from start still. Parallel after hero exists.chain_from_previous: true only when scene i continues directly from i−1. Use composed start still + hard cut for new beats.Gemini TTS per scene → upload each to /v1/files.
Duration gate (required): after TTS, ffprobe each MP3. If any scene exceeds ~19s, fix before p-video — output truncates at the 20s API max even when input.audio is set. Use validate_narration_duration in runners.
If a line is too long (pick one or combine):
| Remedy | When | Action |
|---|---|---|
| Shorten copy | One beat has too many facts | Cut clauses; keep dates/names; target ≤ ~45 words (~17–18s) per scene |
| Faster pace | Line is right length but slow delivery | Tighten Gemini style_prompt (e.g. ~2.3 words/sec, brisk, no filler); regenerate TTS only |
| Split scene | Two story beats in one row | Add scene row + edit_prompt / last_frame_edit_prompt / scene_lines entry; one narration file per row |
Scene anchor triple — one p-video job per row:
{
"prompt": "...",
"image": "START_URL",
"last_frame_image": "END_URL",
"audio": "NARRATION_URL",
"resolution": "720p",
"fps": 24,
"save_audio": true
}
Omit duration. Always include uploaded audio in input — silent duration-only renders truncate narration on concat. Poll all get_url until done; retry failed scenes only.
Adjust prompt, stills, or narration; re-run that scene only.
launch_background_music.py).Scene table + all six URLs per scene (start, end, audio in/out) + prediction ids.
Scene 1: composed start, last=play_end, audio=vo_1 chain→2
Scene 2: extract(clip_1), last=loss_end, audio=vo_2 hard cut→3
Scene 3: composed start, last=search_end, audio=vo_3 chain→4
Scene 4: extract(clip_3), last=tree_end, audio=vo_4 chain→5
Scene 5: extract(clip_4), last=reunion, audio=vo_5
See scene-anchor-triple.md for when to chain vs hard cut, and OPEN/MID/CLOSE prompt structure.