p-video-avatar

Use when the user wants a talking-head video, lip-synced host or spokesperson clip, on-camera performance from a portrait plus script, or narrated avatar footage.

Pruna AI@pruna-ai

Install

openclaw skills install @pruna-ai/p-video-avatar

p-video-avatar (Pruna)

Talking-head video from one image plus either voice_script or audio (if both, audio wins). Full parameters: P-Video-Avatar (Pruna docs).

Dynamic personas & scenarios: realistic-persona-showcase.md · examples: example-prompt.md

Shared HTTP patterns: pruna-api.md (upload, poll, download)

HTTP (curl)

Upload portrait

bash

curl -X POST "https://api.pruna.ai/v1/files" \
  -H "apikey: ${PRUNA_API_KEY}" \
  -F "content=@/path/to/portrait.png"

Use urls.get as input.image.

Create (async — recommended)

See Example: async below. Poll and download: pruna-api.md.

Before generating

Follow single-scene-avatar-video or multi-scene-avatar-video: generation diversity first, then natural human voice_script, realistic conversational voice_prompt, per-scene dynamic video_prompt, locked project_seed, one fixed voice per recurring character, explicit user confirmation before any POST /v1/predictions, then emit and run the agreed generation steps.

When calling the model directly for a small experiment: ritual seed first, then confirm image URL (approved still from /v1/files), exact voice_script, voice / voice_language, voice_prompt (human delivery—not script text), video_prompt (camera/motion), resolution, and seed with the user. Run p-video-avatar-quality-checklist.md on stills and outputs.

Dynamic realistic personas (production)

A believable avatar needs three layers — not a static face on the default motion prompt:

Slop-gated still — from p-image / p-image-edit / optional p-image-try-on; any medium (photoreal, cel anime, clay, CG 3D) with mouth visible; diverse cast, angle, and setting per realistic-persona-showcase.md
Human voice — natural voice_script + short realistic voice_prompt (never brochure copy in either field)
Unique motion per clip — distinct video_prompt per scene (angle, gesture, glance, handheld vs dolly). Do not ship multi-scene reels where every row uses medium close-up, gentle dolly push-in

Stylized hosts (anime, clay, 3D): same mouth-visibility gate; match voice_prompt and video_prompt energy to the style (anime: slightly more expressive motion; documentary: restrained). Cross-style reels need separate hero stills per visual_style_tag — do not edit photoreal into anime from one anchor.

Upstream plate quality caps avatar quality. Regenerate mushy or synthetic stills before avatar. For fashion UGC: photoreal p-image → p-image-try-on → slop gate → avatar with same seed.

Multi-scene: pair each clip’s video_prompt with a matching p-image-edit still (background/angle delta only). See avatar-multi-scene/prompt-templates.md scene table.

Multi-scene: after confirmation, create all avatar jobs in parallel (async, no Try-Sync); batch-poll. Prefer one subagent per clip — see parallel-execution.md.

Realistic human voice (defaults for social / founder content)

Field	Guidance
`voice_script`	Speakable copy: contractions, short sentences, light fillers ("Hey —", "right?"). Avoid brochure language.
`voice_prompt`	How they sound: "Natural conversational tone like a founder on LinkedIn, relaxed pacing, real pauses, honest not salesy." Never paste product names or script lines here.
`video_prompt`	Unique per clip — angle, push-in, gesture, setting motion, glance beats. Never copy one string across a multi-scene reel. Default `The person is talking.` is quick-test only.
`seed`	Generation diversity at project start → `project_seed`; reuse on every clip for that character unless A/B testing motion.

Motion-template use case (for p-video-animate beats): When this model generates a source motion video, prompts must explicitly request speaking — clear lip movement, explain gestures, speaks directly to camera. Motion-source stills need mouth clearly visible ready to speak. See animate-beats.md.

Templates and good/bad pairs: multi-scene-avatar-video/prompt-templates.md.

Field names (JSON)

Pruna P-API uses snake_case in input: voice_script, video_prompt, voice_prompt, voice_language. Some other products use camelCase; map accordingly.

Required input

image (string URL to jpg/jpeg/png/webp)

Plus one of:

voice_script + optional voice, voice_prompt, voice_language, video_prompt, resolution, or
audio (URL to flac/mp3/wav)

Common optional fields

voice (default Zephyr (Female)); see model doc for full voice list
resolution: 720p (default) or 1080p
video_prompt (default The person is talking.)
voice_prompt (style / tone; keep short—can leak into performance if too verbose)
seed, disable_safety_filter, disable_prompt_upsampling
negative_prompt + negative_prompt_strength — experimental text/overlay suppression (see below)

Negative prompt (suppress on-screen text)

Pruna exposes experimental negative prompting on p-video-avatar to reduce burned-in subtitles, captions, and other text artifacts — especially when the start frame came from a still that tempted the model toward labels or signage.

Field	Default	Rule
`negative_prompt`	`""`	Comma-separated elements to suppress — not things you want in frame
`negative_prompt_strength`	`0`	Both must be set: non-empty prompt and strength > 0, or the API ignores them

Starter negative_prompt (text triggers):

text

subtitles, captions, on-screen text, burned-in text, watermark, logo, typography, letters, words, readable signage, UI overlay, lower third, chyron, title card, price tag, packaging label, menu text

Start negative_prompt_strength around 0.3–0.4 and tune per asset. Higher values can drift identity, motion, or background — increase gradually.

Still-side prevention (primary): positive-only still lines (plain unmarked walls, unprinted props) — never no text or avoid signage in creative prompts. negative_prompt on the API is a suppression token list (nouns), not creative wording. Use it as a safety net, not a substitute for clean stills.

Workflow plans: interactive-explainer runner applies defaults from plan.defaults.avatar_negative_prompt / avatar_negative_prompt_strength, with optional per-scene overrides (negative_prompt, negative_prompt_strength). Helper: p_video_avatar_payload.py.

Disable for a scene: set "negative_prompt_strength": 0 on that scene row.

json

"defaults": {
  "avatar_negative_prompt": "subtitles, captions, on-screen text, watermark, logo, typography, letters, words",
  "avatar_negative_prompt_strength": 0.35
}

Example: async (recommended — use for all production)

Omit Try-Sync. For multiple clips, create all jobs in parallel, then batch-poll every get_url. See parallel-execution.md.

Example "seed": 518263 is illustrative — use your random seed ritual integer.

bash

curl -X POST 'https://api.pruna.ai/v1/predictions' \
  -H 'Content-Type: application/json' \
  -H "apikey: ${PRUNA_API_KEY}" \
  -H 'Model: p-video-avatar' \
  -d '{
    "input": {
      "image": "https://api.pruna.ai/v1/files/FILE_ID",
      "voice_script": "Hey — so we shipped something I've wanted for a while.",
      "voice": "Puck (Male)",
      "voice_language": "English (US)",
      "voice_prompt": "Natural conversational tone — relaxed pacing, real pauses.",
      "resolution": "720p",
      "seed": 518263,
      "video_prompt": "Medium close-up speaking directly to lens, subtle push-in",
      "negative_prompt": "subtitles, captions, on-screen text, watermark, logo, typography, letters, words",
      "negative_prompt_strength": 0.35
    }
  }'

Example: sync (single quick test only)

bash

curl -X POST 'https://api.pruna.ai/v1/predictions' \
  -H 'Content-Type: application/json' \
  -H "apikey: ${PRUNA_API_KEY}" \
  -H 'Model: p-video-avatar' \
  -H 'Try-Sync: true' \
  -d '{
    "input": {
      "image": "https://api.pruna.ai/v1/files/FILE_ID",
      "voice_script": "Hey — so we shipped something I've wanted for a while. Sub-second images, video in seconds, and it actually feels usable in a real workflow.",
      "voice": "Puck (Male)",
      "voice_language": "English (US)",
      "voice_prompt": "Natural conversational tone — like a founder on LinkedIn, relaxed pacing, real pauses, honest not salesy.",
      "resolution": "720p",
      "seed": 518263,
      "video_prompt": "Medium close-up speaking directly to lens, subtle push-in, natural head motion, warm confident energy"
    }
  }'

Example: uploaded narration (scene anchor triple — avatar variant)

Generate Gemini TTS → upload to /v1/files. Pass as input.audio with portrait image (and optional last_frame_image when the beat has a known end pose). Duration follows audio. See scene-anchor-triple.md.

bash

curl -X POST 'https://api.pruna.ai/v1/predictions' \
  -H 'Content-Type: application/json' \
  -H "apikey: ${PRUNA_API_KEY}" \
  -H 'Model: p-video-avatar' \
  -d '{
    "input": {
      "image": "https://api.pruna.ai/v1/files/PORTRAIT_START",
      "last_frame_image": "https://api.pruna.ai/v1/files/PORTRAIT_END",
      "audio": "https://api.pruna.ai/v1/files/NARRATION_ID",
      "resolution": "720p",
      "video_prompt": "Medium close-up, natural head motion matching narration"
    }
  }'

If both audio and voice_script are set, audio wins.

Typical next steps

One-scene avatar workflow: single-scene-avatar-video
Multi-scene avatar workflow: multi-scene-avatar-video
Pipeline: pruna-generative-pipeline

Related workflow

Avatar + animate reels: multi-scene-avatar-video — slider script: generate_video_comparison.py.