Install
openclaw skills install @pruna-ai/avatar-single-sceneUse when the user needs one talking-head clip, a single host line with lip sync, one spokesperson beat, or staged approval before generating avatar video.
openclaw skills install @pruna-ai/avatar-single-sceneOne approved portrait → one p-video-avatar job. Stills and QA reuse the same patterns as multi-scene-avatar-video; use generation-quality-checklists.md and that folder’s prompt-templates.md.
Speak to the requester in plain language: explain what they will hear (full voice_script) and see (still + motion) before anything hits the API.
Atomic APIs: p-video-avatar, p-image, p-image-edit, pruna-api.md.
Photoreal dynamic personas: realistic-persona-showcase.md
Staged generation: staged-generation-gate.md · random-seed-ritual.md · workflow-feedback-gates.md
| Phase | What to show | Proceed when |
|---|---|---|
| 0 — Plan | Full voice_script, voice, still + motion plan | approve plan |
| A — Still | Hero / portrait plate | approve still |
| B — Avatar | Single p-video-avatar clip | User accepts |
Write voice_script as real dialogue: contractions, natural rhythm, short sentences—how a person talks on camera, not a press release. See multi-scene-avatar-video/prompt-templates.md for good/bad examples.
voice_prompt must describe human delivery (pacing, warmth, founder/conversational tone)—never paste marketing copy or script lines into it.
voice / voice_language: Pick one preset pair for this clip’s speaker. If this character will appear again in a series or sequel clips, reuse the same presets so they sound like one person (same rule as the multi-scene skill’s cast ledger).p-image-edit from that same URL plus deltas—do not reinvent the face with an unrelated p-image unless the user agrees to a new identity.Do not call POST /v1/predictions until the user (or product owner) has answered these—record answers in the manifest:
| Topic | Questions |
|---|---|
| Goal | What must this one clip communicate (single CTA, greeting, demo line)? |
| Script | Full voice_script as speakable copy—any mandatory pronunciation (names, acronyms)? |
| Voice | Which Pruna voice and voice_language? Keep voice_prompt short (performance vibe only). |
| Look | 9:16 / 16:9 still? Avatar resolution 720p or 1080p? |
| Image source | Upload-only reference, or generate/refine with p-image / p-image-edit first? |
| Motion | Desired energy for video_prompt—specific camera angle and movement (positive wording only)? |
| Character | Age, look, realism level (photoreal vs stylized)—see character sheet in multi-scene-avatar-video |
| Seed | Random seed ritual at hero → project_seed; pass same value to p-video-avatar (or user-supplied seed) |
| Audio (optional) | Upload Gemini TTS for lip-sync via input.audio (preferred over post-mux) — see scene-anchor-triple.md avatar variant. Or use native voice_script. |
If any answer is missing and the user has not waived it, ask before generating.
After intake:
voice_script, chosen voice / voice_language, resolution, and a short description of the still + video_prompt plan.voice_script and confirm again when changes are material.When the user confirms:
curl calls or a small script (shell/Python) that uploads if needed, builds the still, runs p-video-avatar async, polls, and downloads generation_url—matching the approved script exactly. For multi-step prep (edit + avatar), use async and parallel phases per parallel-execution.md.PRUNA_API_KEY, network). Otherwise deliver the same artifact so the user can execute locally.POST /v1/files; collect Pruna file URLs.p-image (photoreal prompt + locked seed) and/or p-image-edit from a locked source. Run the slop gate before avatar.p-video-avatar with snake_case input (image, optional last_frame_image, voice_script or uploaded audio, voice, voice_language, voice_prompt, video_prompt, resolution, seed). Prefer uploaded audio from Gemini TTS when external narration quality matters. Async only (omit Try-Sync); poll to succeeded; download generation_url.