Pixcli Skill

API key required
MCP Tools

Creative toolkit for AI agents — generate images, videos, voiceover, music, and sound effects, then assemble polished output via Remotion. Uses the pixcli CLI (published npm package "pixcli") for all generation. Remotion handles video assembly with 6 bundled templates.

Install

openclaw skills install pixcli

pixcli

Creative toolkit for AI agents. Generate images, videos, voiceover, music, and sound effects — then assemble polished output via Remotion.

Philosophy: The CLI handles complexity (task classification, prompt enrichment, model selection). You just describe what you want.

Requirements

RequirementValueNotes
Primary credentialPIXCLI_API_KEYRequired. Covers all capabilities (image, video, voice, music, SFX). Obtain at https://pixcli.shellbot.sh
RuntimeNode.js ≥ 18node and npx must be on PATH
CLI packagepixcli (npm)Installed at runtime via npx --yes pixcli. Published package: npmjs.com/package/pixcli. Source: github.com/shellbot-ai/pixcli
Remotion (optional)remotion (npm)Only needed for video assembly from bundled templates. Installed via npm install inside template dirs — the templates' package.json declares all deps (remotion, react, react-dom, @remotion/*). No arbitrary package installs.

What runs at runtime and why

  • npx --yes pixcli <command>: Downloads + caches the pixcli CLI from npm on first invocation, then runs it. All subsequent calls use the cached binary. The --yes flag is required in agent contexts to avoid interactive prompts. pixcli is an HTTP client — it sends prompts to the pixcli API (https://pixcli.shellbot.sh/api/v1/*), polls for completion, and downloads the resulting files. It does not execute arbitrary code.
  • npx --yes remotion <command>: Same pattern for the Remotion video renderer. Only used when assembling final videos from generated assets using the bundled templates.
  • npm install (no arguments): Runs inside a copied template directory to install the dependencies declared in that template's package.json. The agent never passes package names to npm install — only hydrates declared deps.
  • ffmpeg / ffprobe: Local-only media operations (trim, merge, scale, get info). No network access.

What does NOT run

  • No bare npx <arbitrary-package> — only npx pixcli and npx remotion
  • No npm install <package-name> — only bare npm install
  • No node <script> — the agent never executes arbitrary JavaScript
  • No npm publish, npm config, or any npm command beyond install and run <script>

Setup

1. Use the CLI

AI agents should always run pixcli via npx --yes pixcli — it's in the scoped allowlist and requires no global install:

npx --yes pixcli image "a red fox in a forest"

Humans who prefer a global install for interactive terminals can optionally run npm install -g pixcli once outside the agent — the agent doesn't need (or have permission for) that command.

Important for AI agents: npx prompts for confirmation before installing packages. The --yes flag auto-accepts. Without it, the command will hang waiting for input. Always use npx --yes pixcli in non-interactive contexts.

Always use --json: All commands support --json which suppresses spinners and human-readable output, returning only structured JSON to stdout. This minimizes token consumption and gives you machine-parseable results. Alternatively, set PIXCLI_JSON=1 once to enable JSON mode for all commands without passing the flag each time.

2. Authenticate

export PIXCLI_API_KEY="px_live_..."

Get your API key at https://pixcli.shellbot.sh. The key covers all capabilities: images, video, voice, music, and sound effects.

3. Verify

pixcli --version
pixcli image "test: a simple blue circle on white background" -o test.png

Agent execution: long-running jobs

Video generation can take 1–10+ minutes (Seedance, Kling, Veo). This matters for agents because:

  • The CLI blocks synchronously while polling for completion
  • Agent tool-call timeouts (typically 2–5 minutes) can kill the process before the video is ready
  • Wasted tokens: spinner updates every 2 seconds don't print in --json mode, but the blocking wait wastes wall-clock budget

The recommended pattern: submit → check (non-blocking)

All generation commands support --no-wait which returns immediately after submission with the job_id. The agent can then check status as often as needed with the non-blocking pixcli job command.

# 1. ALWAYS set these at the start of your session
export PIXCLI_JSON=1       # suppress spinners, return only JSON
export PIXCLI_API_KEY="px_live_..."

# 2. Submit a video job (returns in ~3-10s instead of 5-10 min)
npx --yes pixcli video "A cinematic product orbit, soft lighting" \
  --from product.png --no-wait
# Output: {"job_id":"abc123", "status":"submitted", "check_command":"pixcli job abc123 --json", ...}

# 3. Do other work, then check status (instant, non-blocking)
npx --yes pixcli job abc123
# Output: {"status":"processing", "current_step":1, "total_steps":2}

# 4. When ready, wait + download
npx --yes pixcli job abc123 --wait -o output.mp4
# Output: {"status":"completed", "files":[...], "cost":150000}

When to use --no-wait vs default (blocking)

ScenarioUseWhy
Image generation (~10-30s)Default (no --no-wait)Fast enough that blocking is fine
Video generation (1-10min)--no-wait + poll laterAvoid tool-call timeout, do parallel work
Music/voice/sfx (~10-60s)Default usually fineShort. Use --no-wait if batching many
Parallel pipeline (image → video → extend)--no-wait for each video stepSubmit all, poll all, download all
Quick iteration/draftDefault with -q draftDraft quality is 2-5x faster

Token consumption

ModeTokens returnedBlocking time
No --json500-1000+ (spinner updates, human text)Full wait
--json (blocking)50-100 (clean JSON only)Full wait
--json --no-wait50-80 (submit response only)3-10s (submission only)
pixcli job <id> --json30-60 (status check)Instant

Always set PIXCLI_JSON=1 at the start of your agent session. This single environment variable suppresses spinners, human-readable text, and progress updates for ALL pixcli commands — reducing token cost by ~90%.

Timeout recovery

If a blocking call times out (either the agent's tool timeout or the CLI's internal 10-minute limit), the job is still running on the server. The JSON error output includes recovery commands:

{
  "job_id": "abc123",
  "status": "timeout",
  "error": "CLI poll timeout — job still running on server",
  "check_command": "pixcli job abc123 --json",
  "wait_command": "pixcli job abc123 --wait --json"
}

Parse check_command and execute it to recover. The server never loses a job — it runs to completion regardless of whether the CLI is connected.

Parallel video pipeline example

Generate 3 video clips simultaneously without blocking between them:

# Submit all three (each returns in ~5s)
npx --yes pixcli video "Product hero orbit" --from hero.png --no-wait -o hero.mp4
npx --yes pixcli video "Lifestyle scene, natural light" --from lifestyle.png --no-wait -o lifestyle.mp4
npx --yes pixcli video "App demo, smooth scroll" --from demo.png --no-wait -o demo.mp4

# Parse job IDs from each output
# Then poll all three:
npx --yes pixcli job $JOB_1
npx --yes pixcli job $JOB_2
npx --yes pixcli job $JOB_3

# When all are "completed", download:
npx --yes pixcli job $JOB_1 --wait -o hero.mp4
npx --yes pixcli job $JOB_2 --wait -o lifestyle.mp4
npx --yes pixcli job $JOB_3 --wait -o demo.mp4

This produces 3 videos in the time it takes to render 1 — ~5-8 minutes total instead of ~15-24 minutes sequential.

Commands

pixcli image <prompt> — Generate images

pixcli image "Studio product shot of wireless earbuds, soft lighting, white background" --json
OptionDefaultDescription
-r, --ratio <ratio>1:1Aspect ratio: 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3
-q, --quality <level>standardQuality: draft, standard, high
-t, --transparentfalseTransparent background (PNG)
-n, --count <number>1Number of images (1-4)
--from <path-or-url>Source image for image-to-image or reference generation (repeatable, up to 5: --from a.png --from b.png)
--searchfalseEnable Google Search grounding for real-world accuracy (logos, brands, current events). Only with Nano Banana models
-m, --model <model>autoSpecific model ID
-o, --output <path>autoOutput file or directory
--jsonfalseMachine-readable JSON output
--no-waitfalseSubmit and return immediately (use pixcli job <id> to check later)
--no-enrichSkip prompt enrichment

Models: nano-banana-2 (default — Google Gemini 3 image, descriptive-prose prompting, web-search grounding, thinking), nano-banana-pro (heavier reasoning), gpt-image-2 (OpenAI direct — best in-image text rendering, typography, infographics, diagrams; auto-picked for text-heavy prompts), flux-pro, flux-dev, seedream-v5, imagen-4, imagen-4-fast, flux-vto (virtual try-on — see pixcli tryon), flux-fill (inpaint/reference), kontext (identity-preserving edits). Variants: nano-banana-pro-fal / nano-banana-2-fal (same models via fal), nano-banana-pro-or / nano-banana-2-or (via OpenRouter), gpt-image-2-fal (fal fallback). Use pixcli models --type image for the live list.

Prompting is automatic and model-specific: the API classifies the task, analyzes any attached images (medium, subject, identity, garment, palette), and rewrites your prompt for the chosen model — descriptive prose + semantic negatives for Nano Banana, structured instruction-following with verbatim quoted text for GPT Image 2. Just describe what you want.

pixcli edit <prompt> — Edit images

pixcli edit "Remove the background" -i product.jpg -o product-nobg.png --json
OptionDefaultDescription
-i, --image <path-or-url>requiredSource image (repeatable: -i a.png -i b.png)
-q, --quality <level>standardQuality: draft, standard, high
-m, --model <model>autoSpecific model ID
-o, --output <path>autoOutput file or directory
--jsonfalseMachine-readable JSON output
--no-waitfalseSubmit and return immediately
--no-enrichSkip prompt enrichment

Models: nano-banana-2-edit-fal (conversational edits), gpt-image-2-edit (precise instruction edits with verbatim text — uses an explicit "preserve" list to prevent drift), kontext (identity-preserving edits), flux-fill (inpaint/fill), seedream-v5-edit, phota-enhance, rembg (bg-removal), recraft-upscale, aura-sr (upscale)

pixcli video <prompt> — Generate video

# Image-to-video (recommended: generate still first, then animate)
pixcli video "Slow camera orbit around the product" --from product.png -o reveal.mp4 --json

# Text-to-video (generates image automatically, then animates)
pixcli video "A cat walking through a garden at sunset" -o cat.mp4 --json

# Extend an existing video
pixcli video "The cat jumps over a fence" --from cat.mp4 --extend -o cat-extended.mp4 --json
OptionDefaultDescription
--from <path-or-url>Source image (I2V) or video (extend). Repeatable for multi-reference models (Seedance reference / Omni): --from a.png --from b.png. Single-reference models receive the first one and ignore the rest.
--to <path-or-url>End image — video transitions from --from to --to (Kling/PixVerse transition models)
--negative <prompt>Negative prompt describing what to avoid
--audiofalseEnable native audio generation (BGM, SFX, dialogue) on supported models
-d, --duration <seconds>5Duration: 1-15 seconds (Veo: 4/6/8)
--resolution <res>720pOutput resolution: 480p, 720p, 1080p, 4k (Veo 3.1 is resolution-priced)
-r, --ratio <ratio>16:9Aspect ratio: 16:9, 9:16, 1:1, 4:3, 3:4
-q, --quality <level>standardQuality: draft, standard, high
-m, --model <model>autoSpecific model ID
-o, --output <path>autoOutput file (.mp4)
--jsonfalseMachine-readable JSON output
--no-waitfalseSubmit and return immediately (recommended for video — avoids 10min blocking)
--extendfalseExtend the source video instead of I2V
--storyboardfalseGenerate a still frame with Nano Banana first, then animate it. Opt-in for tight visual control over the opening frame (brand shots, hero compositions). Adds ~5s + one extra credit. Default text-to-video is single-step T2V.

Models — Veo 3.1 (native, google-veo backend) — RECOMMENDED DEFAULT: veo-3.1-lite (default — cheapest Veo, native synchronized audio, handles BOTH t2v & i2v), veo-3.1-fast (balanced), veo-3.1 (highest quality, up to 4k, character consistency). One id covers t2v and i2v — the mode is picked from whether a --from image is present. Resolution-priced via --resolution (720p/1080p/4k). Native audio is fully prompt-driven (quoted dialogue, SFX:, Ambient noise:).

Models — fal backend: seedance-2-t2v / seedance-2-i2v (ByteDance Seedance 2 — cinematic motion, native audio via --audio; routes automatically when the prompt mentions "seedance"/"bytedance"/"doubao"), grok-imagine-i2v (xAI Grok Imagine — fast stylized i2v), kling-o3-pro-i2v / -t2v / -transition (cinematic), kling-o3-standard-i2v / -t2v, kling-v3-pro-i2v, pixverse-v6-i2v / -t2v / -transition / -extend (stylized presets, audio, multi-clip), wan-v2-i2v (cheap motion), minimax-i2v (fast, avoid for faces), ltx-t2v (budget T2V), ltx-extend-video / grok-extend-video / pixverse-v6-extend (extension), veo31-lite-i2v / -t2v / -transition (legacy fal Veo, fallback when native Veo is unavailable), veo3-i2v / -t2v (premium lipsync), heygen-v3-agent (avatar-led explainers). See references/seedance-playbook.md for the full video prompt playbook.

Opinionated approach: Always generate a still first with pixcli image, review it, then animate with pixcli video --from. This gives you control over the starting frame.

Logo animations (brand reveals / intros / bumpers): pass --from logo.png and mention both "logo"/"brand" AND an animation intent ("reveal", "intro", "bumper", "animate") in the prompt — the API auto-detects this and swaps in a specialist Motion Logo Director that emits a 6-stage timeline with sound design, music, and optional voiceover. See references/seedance-logo-motion.md for the full playbook.

Video prompting — the core formula

Every video prompt should follow this structure:

Subject → Action → Environment → Camera → Style → Constraints

Target 60–100 words. Shorter = vague. Longer = conflicting instructions that degrade coherence.

#ElementRuleGood example
1SubjectDescribe visual features explicitlyA woman in her 30s, short black hair, red wool coat
2ActionConcrete verbs + quantify intensitywalks briskly — not walks
3EnvironmentLighting + atmosphere + time of dayrain-slicked Tokyo street at night, neon reflections on wet pavement
4CameraOne instruction only — never chain movesslow push-in — never push then pan then orbit
5StyleSpecific aesthetics onlycinematic, shallow depth of field, film grain
6ConstraintsSay what you want, not what you don'tsmooth motion, stable framing

The 10 rules that always apply:

  1. One camera move per shot. Always. Combining causes jitter.
  2. Separate subject motion from camera motion. ✅ "The dancer spins. Camera holds fixed." ❌ "Spinning camera around a dancing person."
  3. For I2V, only describe what changes — the image carries composition and identity. Add Preserve composition and colors.
  4. Use physical verbsmelt, fracture, snap open > becomes / transforms.
  5. Lighting is your biggest quality lever — always name the light (golden hour, rim light, natural window light, neon, soft diffused, dramatic stage lighting).
  6. Write on a timeline for 10s+ clips — break into 3–5 time-coded beats: [0s–3s]:, [3s–7s]:, etc.
  7. Every asset gets a job — if a file has no role, it's noise. Be explicit about what each --from / --to does.
  8. Put negatives in --negative, not in the main prompt.
  9. For video extend, -d is the NEW duration, not the total.
  10. Draft before hero — always iterate with -q draft (lower resolution like --resolution 480p, or a budget model like ltx-t2v) before burning credits on a full render.

Read references/seedance-playbook.md for the complete playbook — camera movement catalog, lighting table, timeline prompting templates, multimodal role assignment, and 10+ ready-to-paste command recipes.

pixcli voice <text> — Text-to-speech

pixcli voice "Welcome to the future of productivity." -o voiceover.mp3 --json
pixcli voice "Bienvenidos al futuro." --voice Sarah --language es -o vo-spanish.mp3 --json
# Clone a voice from samples, then speak with it (ElevenLabs instant cloning)
pixcli voice "This is my cloned narrator voice." --clone "Narrator" --sample me1.mp3 --sample me2.mp3 -o vo.mp3 --json
OptionDefaultDescription
--voice <name>RachelVoice preset (Rachel, Aria, Roger, Sarah, Laura, Charlie, George, Callum, River, Liam, Charlotte, Alice, Matilda, Will, Jessica, Eric, Chris, Brian, Daniel, Lily, Bill) or a cloned voice_id
--engine <engine>elevenlabsTTS engine: elevenlabs (default) or gemini (steerable, expressive)
--clone <name>Clone an ElevenLabs voice from --sample audio, then speak with it
--sample <path-or-url>Audio sample for --clone (repeatable, 1-5; local file or URL)
--language <code>autoISO 639-1 language code (en, es, fr, de, ja, etc.)
-o, --output <path>autoOutput file (.mp3)
--jsonfalseMachine-readable JSON output

pixcli dialogue — Multi-speaker dialogue (Gemini TTS, max 2 speakers)

pixcli dialogue \
  --speaker "Host:Charon" --speaker "Guest:Kore" \
  --line "Host:Welcome to the show!" \
  --line "Guest:Thanks for having me." \
  -o episode.mp3 --json

# Or from a JSON script file: {"speakers":[{"name","voice"}],"turns":[{"speaker","text","note?"}]}
pixcli dialogue --script episode.json -o episode.mp3 --json
OptionDescription
--speaker <Name:Voice>A speaker mapping (repeatable, max 2)
--line <Speaker:text>A dialogue line (repeatable, in order)
--script <file.json>Load speakers + turns from a JSON file instead of flags
--language <code>Language code
-o, --output <path>Output file (.mp3)

pixcli tryon — Virtual try-on (FLUX Pro VTO)

Render a person wearing a garment. Both images are uploaded automatically.

pixcli tryon --person model.jpg --garment jacket.png -o tryon.png --json
OptionDescription
--person <path-or-url>required — image of the person
--garment <path-or-url>required — image of the garment / clothing item
-p, --prompt <text>Optional guidance prompt
-o, --output <path>Output file or directory

The opinionated path also auto-detects try-on: pixcli edit "put this jacket on the man" -i person.jpg -i jacket.png classifies as virtual_try_on and routes to flux-vto on its own.

pixcli music <prompt> — Generate music

pixcli music "Subtle ambient electronic, minimal beats, corporate technology feel" -d 45 -o bg-music.mp3 --json
OptionDefaultDescription
-d, --duration <seconds>30Duration: 3-120 seconds
-o, --output <path>autoOutput file (.mp3)
--jsonfalseMachine-readable JSON output

Model: lyria-3-pro (default — Google Lyria 3 Pro, prompt-driven genre/mood/instrumentation, up to ~3 min). elevenlabs-music available as an alternative.

pixcli sfx <prompt> — Generate sound effects

pixcli sfx "Smooth cinematic whoosh transition" -d 1.5 -o whoosh.mp3 --json
pixcli sfx "Soft digital click, subtle UI interaction" -d 0.5 -o click.mp3 --json
OptionDefaultDescription
-d, --duration <seconds>5Duration: 0.5-22 seconds
-o, --output <path>autoOutput file (.mp3)
--jsonfalseMachine-readable JSON output

Model: elevenlabs-sfx (ElevenLabs Sound Effects v2). Omit -d to let the model pick the optimal duration from the prompt.

pixcli models — List available models

# Every model, grouped by type
pixcli models --json

# Only Seedance
pixcli models --search seedance --json

# Only native Veo (google-veo backend)
pixcli models --type video --backend google-veo --json
OptionDescription
-t, --type <kind>image | video | audio
-b, --backend <name>fal | google-veo | google-direct | openai-direct | openrouter
-p, --provider <name>Upstream provider: fal | google | bytedance | elevenlabs | openai | xai
-c, --capability <cap>text-to-video | image-to-video | edit | upscale | bg-removal | lipsync | music | sound-effects | text-to-speech | video-extend | enhance
-s, --search <term>Substring match on id, name, or strengths
--jsonMachine-readable JSON output

Backs onto GET /api/v1/models?type=...&backend=...&search=.... Use this to discover model ids before passing them to -m on any generation command.

pixcli job <id> — Check job status and download results

# Check status of a job
pixcli job abc123 --json

# Wait for completion and download
pixcli job abc123 --wait -o output.mp4 --json
OptionDefaultDescription
--waitfalseWait for the job to complete before returning
-o, --output <path>autoOutput file path for downloaded result
--jsonfalseMachine-readable JSON output

Use case: Recover timed-out jobs. Video generation can take 5-8 minutes — if the CLI times out, it prints the job ID and a recovery command. Run pixcli job <id> --wait to pick up where you left off.

Publishing results — shareable URLs (R2)

Any generation command (image, edit, video, voice, dialogue, music, sfx, tryon) can publish its output to a stable, shareable URL so you can drop it straight into another flow (a webpage, a Slack message, another agent). The asset is aliased, not copied.

# Public, anyone-with-the-link URL
pixcli image "hero banner for a launch" --publish --publish-name "launch-hero" --json
# → published_url: https://pixcli.shellbot.sh/p/launch-hero-<uuid>.png

# Private — fetching requires your API key (Authorization: Bearer <key>)
pixcli voice "internal memo" --publish-private --publish-ttl 14 -o memo.mp3 --json
FlagDescription
--publishPublish at a public shareable URL (no auth to fetch)
--publish-privatePublish at a URL that requires your API key to fetch
--publish-ttl <days>Days until the URL expires (default 60, max 365)
--publish-name <name>Friendly slug used in the URL
  • URLs live at https://pixcli.shellbot.sh/p/<slug>-<uuid> and all expire (there is no "never"). Expired URLs return HTTP 410.
  • In --json mode the result includes a published[] array with url, scope, and expires_at per asset.
  • Publishing is best-effort and never fails the generation job.

Global options

OptionDescription
--key <api_key>Override PIXCLI_API_KEY env var
--api-url <url>Override API URL (default: https://pixcli.shellbot.sh)
--jsonMachine-readable JSON output (or set PIXCLI_JSON=1 once for all commands)
--no-waitSubmit the job and return immediately with the job_id — don't poll for completion. Available on: image, edit, video, voice, dialogue, music, sfx, tryon
--versionShow CLI version
--helpShow help

Read references/command-reference.md for the full parameter reference.


Opinionated creative workflow

The full production pipeline

  1. Generate scene stills with pixcli image — use -n 4 for variations, pick the best. Use --search for real-world accuracy (correct logos, current brands). Use --from with multiple images to blend references
  2. Edit heroes with pixcli edit — upscale, remove backgrounds, enhance
  3. Animate 2-3 hero stills with pixcli video --from — cinematic motion for key moments
  4. Generate voiceover with pixcli voice — one file per scene
  5. Generate background music with pixcli music — one track for the full composition
  6. Generate sound effects with pixcli sfx — transition whooshes, UI sounds (use sparingly)
  7. Assemble everything in Remotion — timing, text, transitions, branding, audio mix
  8. Render final video with npx remotion render

When to use AI video vs Remotion

Use pixcli video for:

  • Hero moments: product reveals, cinematic openings, emotional beats (3-8s clips)
  • Organic motion that's hard to code: water, fire, fabric, hair, camera orbits
  • Image-to-video: animate a still into a living scene
  • Transition inserts: short clips between Remotion scenes

Use Remotion for:

  • Text animations, captions, kinetic typography
  • Precise timing synced to voiceover
  • Brand overlays, logos, consistent color grading
  • Data visualizations, metric counters, charts
  • Scene transitions (cuts, wipes, dissolves — deterministic)
  • Final assembly: compositing AI video clips + stills + audio + text

The ideal combined workflow:

  1. Generate scene stills with pixcli image (consistency via shared style prompts)
  2. Animate 2-3 hero stills with pixcli video --from (cinematic motion)
  3. Generate voiceover + music + SFX
  4. Assemble everything in Remotion (timing, text, transitions, audio mix)

Audio layering strategy

  • Voiceover at volume 1.0 — clear, intelligible, primary channel
  • Music at 0.15-0.25 — duck under voiceover, never compete
  • SFX sparse and purposeful — only when they reinforce movement
  • Avoid dense music during problem framing

Quality tiers

  • draft — Fast iteration, concepting, throwaway tests
  • standard — Good for most production work (default)
  • high — Hero shots, final delivery assets

ffmpeg local editing

Use ffmpeg for quick video/audio edits without a full Remotion project. These run locally — no API calls needed.

Video operations

# Trim video (extract 5 seconds starting at 2s)
ffmpeg -i input.mp4 -ss 00:00:02 -to 00:00:07 -c copy trimmed.mp4

# Split video at a timestamp
ffmpeg -i input.mp4 -t 5 -c copy first-half.mp4
ffmpeg -i input.mp4 -ss 5 -c copy second-half.mp4

# Merge videos (create filelist.txt first)
echo "file 'clip1.mp4'" > filelist.txt
echo "file 'clip2.mp4'" >> filelist.txt
ffmpeg -f concat -safe 0 -i filelist.txt -c copy merged.mp4

# Scale video to 1080p
ffmpeg -i input.mp4 -vf "scale=1920:1080" -c:a copy scaled.mp4

Audio operations

# Add audio track to video
ffmpeg -i video.mp4 -i music.mp3 -c:v copy -c:a aac -shortest output.mp4

# Replace audio track
ffmpeg -i video.mp4 -i new-audio.mp3 -c:v copy -c:a aac -map 0:v -map 1:a output.mp4

# Mix voiceover + music (music ducked to 20%)
ffmpeg -i voiceover.mp3 -i music.mp3 -filter_complex "[1:a]volume=0.2[music];[0:a][music]amix=inputs=2:duration=longest" mixed.mp3

# Extract audio from video
ffmpeg -i video.mp4 -vn -acodec copy audio.aac

# Get media info
ffprobe -v quiet -print_format json -show_streams input.mp4

Remotion video production

Remotion is the source of truth for timing, layout, animation, and render. Use pixcli to generate the visual and audio assets, then assemble everything in Remotion.

Bootstrapping a Remotion project

cp -r assets/templates/cinematic-product-16x9 ./my-video
cd ./my-video && npm install
npx remotion studio     # Preview
npx remotion render src/index.ts MainComposition out/video.mp4  # Render

Templates

TemplateBest forAspect
aida-classic-16x9Product marketing (AIDA framework)1920x1080
cinematic-product-16x9Premium product launches1920x1080
saas-metrics-16x9B2B SaaS, dashboard metrics1920x1080
mobile-ugc-9x16Reels, TikTok, Stories1080x1920
blank-16x9Custom projects1920x1080
explainer-16x9How-it-works, tutorials1920x1080

Integrating AI video clips in Remotion

Use OffthreadVideo for AI-generated clips inside Remotion compositions:

import { OffthreadVideo, Sequence, Audio, Img, staticFile } from "remotion";

// AI video clip as a hero moment
<Sequence from={0} durationInFrames={150}>
  <OffthreadVideo src={staticFile("assets/hero-reveal.mp4")} />
</Sequence>

// AI-generated still as background
<Sequence from={150} durationInFrames={120}>
  <Img src={staticFile("assets/scene-solution.png")} style={{ width: "100%", height: "100%" }} />
</Sequence>

// Voiceover + music
<Audio src={staticFile("audio/voiceover.mp3")} volume={1} />
<Audio src={staticFile("audio/music.mp3")} volume={0.2} />

Remotion principles

  • Keep all Remotion packages on the same pinned version
  • Transitions: 8-18 frames, purposeful (not decorative)
  • Load fonts explicitly with @remotion/google-fonts
  • Always run npm run verify before npm run render
  • Load reference rules from references/remotion-rules/ as needed

Read references/remotion-playbook.md for detailed Remotion implementation guidance.


Output convention

  • pixcli downloads generated files to the current directory (or path specified with -o)
  • Use --json for machine-readable output (pipe to jq or parse in scripts)
  • All operations are synchronous from the CLI perspective (the CLI handles async polling internally)
  • Video jobs may take 1-8 minutes (the CLI shows progress). If a job times out, use pixcli job <id> --wait to recover

References

Creative guidance

  • references/command-reference.md — Full parameter docs for all pixcli commands
  • references/creative-guidelines.md — Quality standards for productions
  • references/prompt-cookbook.md — Proven prompt patterns for every task
  • references/seedance-playbook.md — Video prompting masterclass (Seedance 2 + all video models): 6-element formula, camera catalog, lighting table, timeline prompting, multimodal role assignment, 10+ ready-to-paste recipes
  • references/seedance-logo-motion.mdLogo animation specialist playbook. Use when the user provides a logo image + asks for a brand reveal / intro / bumper. Auto-activated by the API when --from logo.png is combined with logo+motion keywords.
  • references/workflow-recipes.md — End-to-end recipe examples
  • references/ancillary-assets.md — Asset generation strategy for Remotion scenes

Remotion

  • references/remotion-playbook.md — Remotion implementation guide
  • references/template-showcase.md — Template selector guide
  • references/remotion-rules-index.md — Index of 30+ Remotion rule files
  • references/remotion-rules/ — Detailed rules (animations, audio, text, transitions, etc.)

Templates

  • assets/templates/aida-classic-16x9/
  • assets/templates/cinematic-product-16x9/
  • assets/templates/saas-metrics-16x9/
  • assets/templates/mobile-ugc-9x16/
  • assets/templates/blank-16x9/
  • assets/templates/explainer-16x9/