Install
openclaw skills install @pruna-ai/p-videoUse when the user wants to generate video from text or stills, animate between start and end frames, create B-roll or cinematic clips, or build narrated scene chains.
openclaw skills install @pruna-ai/p-videoPremium video from text, optional first-frame / last-frame images, or optional audio. Same model on Replicate — see First / last frame chaining for multi-scene handoffs.
Full P-API parameters: p-video model docs.
Shared HTTP patterns: pruna-api.md (upload, poll, download)
See Example: async text-to-video below. Poll and download: pruna-api.md.
curl -X POST "https://api.pruna.ai/v1/files" \
-H "apikey: ${PRUNA_API_KEY}" \
-F "content=@/path/to/first-frame.png"
Pass urls.get as input.image (first frame) and/or input.last_frame_image (last frame).
duration (unless audio-driven), resolution, fps, draft, seed, and prompt with the user—or run intake from single-scene-ai-video, scene-transition-video, or multi-scene-ai-video.For narration or music, see audio-post-production.md — Gemini TTS, Stable Audio beds, or upload audio for audio-conditioned mode.
Validate renders with p-video-quality-checklist.md.
prompt (string)| Field | Role |
|---|---|
image | First frame — image-to-video anchor; when set, aspect_ratio is ignored |
last_frame_image | Last frame — optional end-state still the clip should move toward |
audio | Audio-conditioned; duration follows audio (capped at 20s on P-API); formats flac, mp3, wav |
duration | 1–20 seconds on P-API (ignored if audio set). With audio, clip length = min(audio length, 20s) — keep TTS ≤ ~19s per scene |
resolution | 720p (default) or 1080p |
fps | 24 (default) or 48 |
aspect_ratio | When no image: 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, 1:1 |
draft | true = ~4× faster/cheaper preview (pricing); false = final |
save_audio | Keep model-generated dialogue/SFX on output (native audio) |
seed | Reproducibility |
prompt_upsampling | Auto prompt enhancement (default on Replicate) |
disable_safety_filter | Client policy |
Use image + last_frame_image to steer motion between two known compositions — the model interpolates from start plate to end plate. This is the primary tool for smoother multi-scene stories when hard cuts between unrelated T2V clips feel disjointed.
input.imagep-image-edit an end still (same subject, new pose/background beat) → input.last_frame_imagep-video with a motion-only prompt describing camera + action between the two platescurl -X POST 'https://api.pruna.ai/v1/predictions' \
-H 'Content-Type: application/json' \
-H "apikey: ${PRUNA_API_KEY}" \
-H 'Model: p-video' \
-d '{
"input": {
"prompt": "Dog turns head toward flying toy, grass sways, gentle push-in, warm afternoon light",
"image": "https://api.pruna.ai/v1/files/START_ID",
"last_frame_image": "https://api.pruna.ai/v1/files/END_ID",
"duration": 5,
"resolution": "720p",
"fps": 24
}
}'
| Strategy | How | Parallel? |
|---|---|---|
| Planned anchor chain (recommended) | p-image-edit produces start_i and end_i per scene. Scene i uses image=start_i, last_frame_image=end_i. Scene i+1 uses image=end_i (same URL). | Stills: parallel. Video: phased if each scene's end_i must be approved before the next scene's render — or parallel once all stills exist. |
| Extract last frame | After scene i renders, ffmpeg-grab last frame → upload → image for scene i+1 | Sequential — scene i+1 cannot start until scene i finishes |
Continuity tips
resolution, fps, and lighting language consistent across chained scenes.draft: true on the full chain for cheap motion approval, then rerun finals with locked seed/prompts.See scene-transition-video for multi-scene visual montages and multi-scene-ai-video for scene-table columns and parallel-execution.md for phased vs parallel batches.
Canonical visual transition spec: scene-anchor-pair.md.
Canonical narrated spec: scene-anchor-triple.md.
Use when you have two photos and want p-video to interpolate motion between them — no narration required. Stills from p-image hero + p-image-edit, or user uploads.
| Field | Required | Role |
|---|---|---|
image | yes | Start plate |
last_frame_image | yes | End plate |
prompt | yes | OPEN → MID → CLOSE motion between plates (not still descriptions) |
duration | yes | Prefer 4–5s; omit when audio is set |
curl -X POST 'https://api.pruna.ai/v1/predictions' \
-H 'Content-Type: application/json' \
-H "apikey: ${PRUNA_API_KEY}" \
-H 'Model: p-video' \
-d '{
"input": {
"prompt": "OPEN: hold wide. MID: slow dolly in, neon flickers. CLOSE: settle on end pose.",
"image": "https://api.pruna.ai/v1/files/START_ID",
"last_frame_image": "https://api.pruna.ai/v1/files/END_ID",
"duration": 5,
"resolution": "720p",
"fps": 24
}
}'
Stills pipeline: p-image (hero) → p-image-edit (edit_prompt → start) → p-image-edit (last_frame_edit_prompt → end).
Multi-scene: scene-transition-video — selective chain_from_previous, extract_last_frame, concat crossfades.
Upgrade to narrated: add TTS → upload → audio; omit duration → scene-anchor-triple.md.
image + last_frame_image + audio)For narrated multi-scene films, treat each scene as three uploaded anchors — the same way first/last frames bracket visual motion, audio brackets timing and performance:
| Anchor | Field | Role |
|---|---|---|
| First frame | image | Opening composition (hero or p-image-edit start still) |
| Last frame | last_frame_image | Closing composition (end still; becomes next scene's image when chaining) |
| Narration / VO | audio | Uploaded TTS or music — sets clip duration (up to 20s P-API max); model syncs motion to speech |
All three are Pruna file URLs from POST /v1/files. Pass them together in one p-video prediction; omit duration when audio is set. Probe each TTS file with ffprobe before render — lines longer than ~19s are truncated at the API ceiling even when audio is passed (p_video_payload.py validate_narration_duration).
curl -X POST 'https://api.pruna.ai/v1/predictions' \
-H 'Content-Type: application/json' \
-H "apikey: ${PRUNA_API_KEY}" \
-H 'Model: p-video' \
-d '{
"input": {
"prompt": "Dog tosses plush upward, tail wagging, motion matches narrator, warm light",
"image": "https://api.pruna.ai/v1/files/SCENE_START",
"last_frame_image": "https://api.pruna.ai/v1/files/SCENE_END",
"audio": "https://api.pruna.ai/v1/files/SCENE_NARRATION",
"resolution": "720p",
"fps": 24,
"save_audio": true
}
}'
Scene 1: image=start_1, last_frame_image=end_1, audio=vo_1
Scene 2: image=end_1, last_frame_image=end_2, audio=vo_2 ← same URL as scene 1 end
Scene 3: image=end_2, last_frame_image=end_3, audio=vo_3
Stills phase: p-image-edit start still from hero; end still from start still (parallel per scene after starts exist).
Audio phase: Gemini TTS per scene (parallel) → upload each.
Video phase: parallel p-video when all six URLs per scene row are ready.
Same triple pattern on p-video-avatar: portrait image + optional last_frame_image + uploaded audio for lip-sync narration.
| Mode | Input | Duration | When |
|---|---|---|---|
| Silent / native SFX | prompt only (optional save_audio) | duration | Ambient clips, model-generated sound |
| Uploaded audio (preferred) | audio URL + prompt (+ optional image, last_frame_image) | Follows audio — omit duration | VO, TTS, song slices — upload to /v1/files first; set save_audio: true for narrated concat |
| External narration | gemini-3.1-flash-tts → upload → audio | Follows audio | Documentary narrator — same as uploaded audio; never post-mux over silent clips |
| Post mux (fallback only) | Silent p-video renders | duration | Only when re-render is impossible — truncates long TTS |
Shared helper: p_video_payload.py — enforces omitting duration when audio is set.
Full layering guide: audio-post-production.md.
curl -X POST 'https://api.pruna.ai/v1/predictions' \
-H 'Content-Type: application/json' \
-H "apikey: ${PRUNA_API_KEY}" \
-H 'Model: p-video' \
-d '{
"input": {
"prompt": "Slow dolly in, rain on city street at night, cinematic",
"duration": 5,
"resolution": "720p",
"aspect_ratio": "16:9"
}
}'
Poll and download: pruna-api.md.
Multi-scene: after shared uploads, fire predictions in parallel when scenes do not share frame dependencies; use phased batches when last_frame_image of scene N is the image of scene N+1 and stills are not pre-planned. See parallel-execution.md.
Upload image to /v1/files, pass its urls.get as input.image.
curl -X POST 'https://api.pruna.ai/v1/predictions' \
-H 'Content-Type: application/json' \
-H "apikey: ${PRUNA_API_KEY}" \
-H 'Model: p-video' \
-d '{
"input": {
"prompt": "Dog searches through tall grass, cinematic, motion matches narrator mood",
"image": "https://api.pruna.ai/v1/files/START_ID",
"last_frame_image": "https://api.pruna.ai/v1/files/END_ID",
"audio": "https://api.pruna.ai/v1/files/NARRATION_ID",
"resolution": "720p",
"fps": 24,
"save_audio": true
}
}'
Omit duration when audio is set. See Scene anchor triple above.
curl -X POST 'https://api.pruna.ai/v1/predictions' \
-H 'Content-Type: application/json' \
-H "apikey: ${PRUNA_API_KEY}" \
-H 'Model: p-video' \
-d '{
"input": {
"prompt": "Dog searches through tall grass, cinematic, matches narrator mood",
"image": "https://api.pruna.ai/v1/files/START_ID",
"audio": "https://api.pruna.ai/v1/files/NARRATION_ID",
"resolution": "720p",
"fps": 24
}
}'
Multi-scene AI video: multi-scene-ai-video — scene table, frame columns, audio phases.