{"skill":{"slug":"local-video-ad-pipeline-v05-public","displayName":"Local Video Ad Pipeline v0.5","summary":"Local Video Ad Pipeline v0.5 is a public OpenClaw skill for producing short commercial videos and YouTube Shorts with local AI models. It uses a local LLM as...","description":"---\nname: local-video-ad-pipeline-v05-public\ndescription: \"Local Video Ad Pipeline v0.5 is a public OpenClaw skill for producing short commercial videos and YouTube Shorts with local AI models. It uses a local LLM as the full film director: concept bible, story intent, visual arc, beats, shot direction, shotlist, image prompts, Korean subtitle script, and subtitle-based timing. Keyframes are generated with Qwen-Image or SDXL through ComfyUI, shots are animated sequentially with Wan2.2, optional BGM can be created with ACE-Step, and final MP4 assembly is handled with ffmpeg. Designed for local GPU workflows where models must be loaded one at a time. Includes character consistency rules, prompt-level identity locking, direct Qwen-Image keyframe generation, Korean subtitle wrapping, no-slow native-speed assembly, contact-sheet QA, and practical GPU coexistence guidance.\"\n---\n\n# Local Video Ad Pipeline\n\nEnd-to-end recipe for a short multi-shot video produced with local models. The workflow is intentionally sequential because Qwen-Image, Wan2.2, and ACE-Step often compete for the same GPU memory pool.\n\nVersion: public v0.5. This version has no separate film-director helper reference. The local LLM performs the complete directing process from the user's brief: story intent, visual arc, shot rhythm, camera direction, actor expression, shotlist, image prompts, and subtitle timing.\n\n## When to deviate\n\nIf the user wants a single still image, use a manga/manhwa/qwen-image skill instead. If they want long-form video over one minute, dialogue, or lipsync, this skill is the wrong tool; recommend a hosted video service or a narrower local prototype. If they only want music, use `scripts/fire_bgm.py`.\n\n## Project layout\n\n```\n<project>/\n  meta/\n  bible.json\n  beats.json\n  shotlist/shotlist.json\n  prompts/prompts.json\n  character/anchor.png\n  character/character_bible.json\n  durations.json\n  keyframes/<sid>.png\n  videos/<sid>.mp4\n  audio/bgm.wav\n  subs.json\n  final/final.mp4\n```\n\n## Before running\n\nRun the read-only environment check. For silent/no-BGM videos, pass `--silent` so ACE-Step is treated as intentionally skipped.\n\n```bash\npython scripts/doctor.py --project <project> --comfy http://127.0.0.1:8192\npython scripts/doctor.py --project <project> --comfy http://127.0.0.1:8192 --silent\n```\n\nIt checks Python, ffmpeg, WSL input/output folders, the ComfyUI server, and required ComfyUI nodes.\n\n## Pipeline\n\n### 1. Pre-production scripting\n\nGenerate `bible.json`, `director_notes.json`, `beats.json`, `shotlist/shotlist.json`, and `prompts/prompts.json` by chatting a local llama.cpp server. See `references/preproduction.md` for prompt templates and JSON schemas.\n\nDefault female protagonist casting, unless the user specifies a different look: make an adult Korean woman in her early 20s with celebrity/Instagram-model-level beauty, bright clear eyes, pure elegant aura, polished but realistic skin, and a glamorous adult model figure. For photoreal local Qwen-Image work, separate the image grammar: backgrounds, lighting, camera, hands, props, and environment should be ultra-realistic and candid; the protagonist should be cast as an obviously beautiful high-end model rather than an ordinary realistic person. YouTube Shorts are attention-driven, so adult glamour, sensual styling, fitted silhouettes, fashion/swimwear/lingerie/gravity-of-attraction aesthetics, and body-forward composition are allowed when they fit the brief. Use a fitted fashion silhouette when the scene allows it: open cardigan or jacket, fitted V-neck knit/top/blouse, balanced hourglass proportions, and clearly defined G-cup bust silhouette through clothing. Keep the character clearly adult. If the user asks for minors, school-age characters, school uniforms, childlike characters, or age-ambiguous characters, override this default and keep the portrayal conservative and age-safe.\n\nDo not put broad sexuality, clothing, body, or exposure suppressors such as `sexualized`, `revealing clothing`, `cleavage`, `large breasts`, `lingerie`, `swimwear`, `transparent blouse`, or `nudity` into the global default negative prompt. Those suppressors block normal adult fashion, glamour, swimsuit, underwear, and Shorts-style attention hooks. Use them only when the user explicitly requests a conservative/no-exposure project or when the protagonist is a minor, school-age, childlike, or age-ambiguous character. The default negative should focus on quality failures: duplicate people, collage/split screen, bad hands, text, watermark, wrong age, ordinary/plain face when a model protagonist is requested, and plastic AI skin.\n\nFor stronger film direction, start with `director_notes.json` before writing beats. The local LLM director pass owns the story arc, rhythm, opening image, turning point, final image, shot-size progression, camera direction, actor expression, and continuity rules. Later passes derive beats, shotlist, and prompts from that LLM-generated director pass.\n\nThe shotlist schema is a hard requirement. `fire_videos.py` reads `shot_id`, `action`, `mood`, `lighting`, `camera_motion`, and `shot_type` from each entry. The LLM pre-production pass must also fill `director_intent`, `actor_direction`, `emotional_expression`, `composition`, and `continuity`; these fields guide keyframe prompts and help prevent disconnected pretty shots.\n\nFor any recurring protagonist, `bible.json` must include `characters[].lock_tokens`. Every prompt for a shot with `needs_character: true` must begin with that exact string, unchanged.\n\nFor facial performance, use `references/expression_language.md`. Do not rely on `mood` alone; add visible expression cues such as heavy eyelids, lifted brows, pressed lips, faint smile, direct eye contact, or shoulders opening.\n\nFor detail shots, use `identity_framing` so the face lock does not overpower the intended crop. Use `feet_only`, `hands_only`, `body_detail`, or `back_view` for shoes, hands, backpack straps, lower legs, and walking-away shots.\n\n### 2. Keyframes\n\nRender one PNG per shot to `<project>/keyframes/<sid>.png` using direct Qwen-Image BF16 ComfyUI workflow JSON. This is the default path for cinematic keyframes because it preserves the director prompt and avoids fixed GUI styling.\n\n```bash\npython scripts/generate_keyframes_direct.py --project <project> \\\n  --comfy http://127.0.0.1:8194 --shots S01 S02 S03 S04 S05 S06\n\npython scripts/image_contact_sheet.py --project <project> --glob \"keyframes/*.png\" \\\n  --out <project>/final/keyframe_contact_sheet.jpg\n```\n\nThe direct script inserts `characters[].lock_tokens` from `bible.json` at the start of every `needs_character: true` shot prompt. This prompt-level character lock is the preferred local identity method for Qwen-Image BF16.\n\nUse the older GUI-builder T2I helper only as a fallback when you intentionally want the local GUI's fixed settings:\n\n```bash\npython scripts/generate_keyframes_t2i.py --project <project> \\\n  --comfy http://127.0.0.1:8194 --shots S01 S02\n```\n\nAvoid the GUI-builder path for free cinematic angle work. Local GUI presets can make shots look overly standardized or can fight the shot prompt.\n\nQwen T2I can interpret film/storyboard language as a multi-panel page. The direct script forces one single full-frame vertical photograph and rejects collage, storyboard, split screen, multiple panels, triptych, film strip, and contact sheet. If a generated keyframe still contains multiple panels, reject it and regenerate only that shot.\n\nIf prompt-level locking is not enough for one or two shots, use the character anchor workflow as a selective repair path:\n\n```bash\n# 1. Start the Qwen Image Edit ComfyUI stack, usually :8189.\n\n# 2. Create or choose a stable protagonist anchor.\npython scripts/generate_character_anchor.py --project <project> \\\n  --prompt \"photorealistic Korean high school girl, modest school uniform, natural face, shoulder-length black hair, wholesome study film protagonist\" \\\n  --comfy http://127.0.0.1:8189\n\n# 3. Repair only rejected shots by editing from the same anchor.\npython scripts/generate_keyframes_from_anchor.py --project <project> \\\n  --comfy http://127.0.0.1:8189 --shots S03 S05\n\n# 4. Inspect identity consistency before Wan2.2.\npython scripts/image_contact_sheet.py --project <project> --glob \"keyframes/*.png\" \\\n  --out <project>/final/keyframe_contact_sheet.jpg\n```\n\nFor stronger control, add `character_identity` to `prompts/prompts.json`:\n\n```json\n{\n  \"global_style\": \"cinematic realistic Korean study film\",\n  \"character_identity\": \"one beautiful adult Korean woman in her early 20s, celebrity-level Instagram model face, clear bright dark-brown eyes, long silky natural black hair, fresh Korean daily makeup, polished realistic skin texture, glamorous adult model figure, fitted fashion silhouette, clearly defined G-cup bust silhouette through clothing\"\n}\n```\n\nBefore continuing, make a contact sheet or inspect representative frames. Verify:\n\n- The protagonist matches the requested age, gender, clothing, and tone.\n- The shot is conservative and age-safe when minors, school-age characters, school uniforms, childlike characters, or age-ambiguous characters are requested. For clearly adult protagonists, do not reject glamour, sensual styling, fitted clothing, cleavage, swimwear, lingerie, or body-forward composition merely because it is attractive.\n- Exactly one protagonist appears when the request asks for one person; no duplicate, twin, friend, or accidental group shot.\n- The same character identity is plausible across shots.\n- Text on screens or notebooks is not relied on for the message; final subtitles carry the message.\n\nIf one or two shots drift, regenerate only those shot IDs with `generate_keyframes_direct.py --shots S03 S05` first. Use Image Edit only after direct retries fail. Do not proceed to Wan2.2 until the keyframe sheet is acceptable.\n\n### 3. Wan2.2 video\n\nRead `references/wan22_server.md` first. It has the lifecycle rules.\n\n```bash\n# Start the WSL Wan2.2 ComfyUI server, or confirm :8192 is already up.\n\npython scripts/fire_warmup.py --comfy http://127.0.0.1:8192\n# Wait for fire_warmup_out_00001.mp4 in:\n# \\\\wsl.localhost\\Ubuntu\\home\\choi_g16\\comfy\\ComfyUI\\output\\\n\npython scripts/fire_videos.py --project <project> --comfy http://127.0.0.1:8192\n# Do not poll ComfyUI while rendering.\n\npython scripts/fire_videos.py --project <project> --collect\n# Copies finished film_video_Sxx*.mp4 files into <project>/videos/Sxx.mp4.\n```\n\nFor VRAM-safe unattended work, prefer sequential rendering:\n\n```bash\npython scripts/render_sequential.py --project <project> --shots S01 S02 S03 S04 S05 S06 \\\n  --comfy http://127.0.0.1:8192 --frames 80 --width 832 --height 480\n```\n\nFor commercial/YouTube Shorts rhythm, prefer subtitle-based variable timing. Read `references/subtitle_timing.md`.\n\n```bash\npython scripts/plan_subtitle_durations.py --project <project> \\\n  --subs <project>/subs.json --update-shotlist\n\npython scripts/render_sequential.py --project <project> --shots S01 S02 S03 S04 S05 S06 \\\n  --comfy http://127.0.0.1:8192 \\\n  --durations <project>/durations.json \\\n  --duration-fps 16 --frame-pad 8 \\\n  --width 480 --height 832\n```\n\nRe-shoot specific shots without restarting WSL:\n\n```bash\npython scripts/fire_videos.py --project <project> --shots S02 S05\npython scripts/fire_videos.py --project <project> --shots S02 S05 --collect\n```\n\n### 4. BGM\n\nSkip this stage when the user requests silence or no background music. If BGM is requested, read `references/ace_bgm.md` first. It has the GPU coexistence rule.\n\n```bash\nwsl --shutdown\n# Start F:\\AI\\ACE-Step\\start_gradio_ui_rocm.bat with CHECK_UPDATE=false.\n\npython scripts/fire_bgm.py --out <project>/audio/bgm.wav --duration 30 \\\n  --prompt \"dark cinematic trap, hard 808 bass, instrumental, no vocals\"\n```\n\n`fire_bgm.py` now auto-discovers the ACE-Step generation endpoint. Pass `--fn-index` only if discovery fails after an ACE-Step UI update.\n\n### 5. Compose\n\nWrite `<project>/subs.json`:\n\n```json\n{\n  \"S01\": \"지금 시작하자.\",\n  \"S02\": \"실패도 연습이 된다.\",\n  \"S03\": \"다시 시도하면 된다.\"\n}\n```\n\nStandard compose, where short generated clips may be stretched to a target per-shot duration:\n\n```bash\npython scripts/compose.py --project <project> \\\n  --subs <project>/subs.json \\\n  --bgm <project>/audio/bgm.wav \\\n  --out <project>/final/final.mp4 \\\n  --per-clip 5 --hero S03 S06\n```\n\nPreferred Shorts compose, where each shot duration follows subtitle reading time:\n\n```bash\npython scripts/plan_subtitle_durations.py --project <project> \\\n  --subs <project>/subs.json --cps 5.5 --min 1.6 --max 5.0 --update-shotlist\n\npython scripts/compose.py --project <project> \\\n  --subs <project>/subs.json \\\n  --durations <project>/durations.json \\\n  --out <project>/final/final.mp4 \\\n  --no-slow --max-subtitle-chars 34\n```\n\nSilent/no-slow compose, where clip speed is not stretched. Use this when the user says not to slow, interpolate, or extend frames to fit time. If the result is shorter than the target, render more native frames or add another short native clip.\n\n```bash\npython scripts/compose.py --project <project> \\\n  --subs <project>/subs.json \\\n  --out <project>/final/final.mp4 \\\n  --no-slow --target-duration 45 --max-subtitle-chars 34\n```\n\nCreate a visual QA sheet:\n\n```bash\npython scripts/contact_sheet.py <project>/final/final.mp4 --out <project>/final/contact_sheet.jpg\npython scripts/validate_video.py <project>/final/final.mp4 --orientation portrait\n```\n\nThe compose script burns subtitles from UTF-8 text files. It sanitizes direct newline characters because `drawtext` can render them as missing glyph boxes on this Windows ffmpeg path. Keep Korean subtitles short and use `--max-subtitle-chars` for overflow control.\n\n## Duration Rules\n\nDefault commercial/Shorts timing is subtitle-based, not equal-grid timing:\n\n```\nshot_duration_s = visible_subtitle_chars / cps + lead_breath + tail_breath\ntotal_duration_s = sum(shot_duration_s)\n```\n\nIf a user asks for exact duration and no slow-motion, calculate native frames first:\n\n```\nnative_seconds ~= frames / source_fps\ntarget_seconds = sum(native clip durations)\n```\n\nFor Wan2.2 at 16 fps, 80 frames often lands around 4.8 seconds after encode. Verify with `ffprobe` and add native frames or a short final native clip if needed. Do not use `setpts` stretching in no-slow mode.\n\n## Hard Rules\n\n- Never restart the WSL ComfyUI server between shots unless it is actually wedged. First load is slow; warm runs are much faster.\n- Do not poll `/queue` or `/history` while Wan2.2 is rendering. Use the filesystem and `--collect` after enough time has passed.\n- Qwen-Image, Wan2.2, and ACE-Step should not run simultaneously on this 96 GB UMA host. Use one server at a time.\n- For one recurring protagonist, use prompt-level `lock_tokens` plus direct Qwen T2I as the default. Use Qwen Image Edit from an anchor only for same-scene variations or selective face/outfit correction.\n- Do not use the Qwen GUI builder as the primary keyframe path for cinematic shorts; it can impose fixed style and weaken free shot design.\n- For YouTube Shorts, keep the same portrait aspect through every stage: keyframes, Wan render, compose, and QA. Use `480x832` for the local Wan2.2 production path unless deliberately making landscape.\n- Use `wsl --shutdown` between major server classes when VRAM behavior looks sticky.\n- Keep Wan2.2 production settings around 832x480 landscape or 480x832 Shorts portrait / 33+ frames / 4 Lightning steps unless intentionally stress testing.\n- For no-slow videos, render native frames from subtitle duration via `render_sequential.py --durations`; use fixed 80-frame clips only for rough prototypes.\n- Use `CHECK_UPDATE=false` for the ACE-Step launcher in unattended mode.\n- After switching image/video/BGM servers, clean up stale launch windows and ports:\n\n```powershell\npowershell -ExecutionPolicy Bypass -File scripts/cleanup_local_video_servers.ps1\n```\n\n## Delivery\n\nOpenClaw's `MEDIA:` directive can fail for larger Telegram videos. If that happens, send via the Telegram Bot API directly using the bot token already configured in OpenClaw. Never paste the token into chat logs.\n\n## Time budget\n\nVerified target for six 5-second shots:\n\n| Stage | Time |\n| --- | --- |\n| Pre-production | ~5 min |\n| Keyframes | ~10 min |\n| Wan warmup | ~10 min |\n| Wan render | ~5-20 min per shot, depending on frames |\n| BGM | ~1 min |\n| Compose | ~1 min |\n\nAdd time for human review and re-shoots.\n","topics":["Pipeline","Prompt"],"tags":{"latest":"0.5.1","local-ai":"0.5.1","openclaw":"0.5.1","qwen-image":"0.5.1","shorts":"0.5.1","video":"0.5.1","wan2.2":"0.5.1"},"stats":{"comments":0,"downloads":347,"installsAllTime":13,"installsCurrent":0,"stars":0,"versions":2},"createdAt":1778324383022,"updatedAt":1778492885325},"latestVersion":{"version":"0.5.1","createdAt":1778324639902,"changelog":"Improve public English title and description.","license":"MIT-0"},"metadata":null,"owner":{"handle":"k0103292xxxx","userId":"s174fftjfp5pwnjrg8tefj4hr183hm4g","displayName":"k0103292XXXX","image":"https://avatars.githubusercontent.com/u/268130256?v=4"},"moderation":{"isSuspicious":false,"isMalwareBlocked":false,"verdict":"clean","reasonCodes":["review.llm_review"],"summary":"Review: review.llm_review","engineVersion":"v2.4.24","updatedAt":1780090771770}}