Install
openclaw skills install animaAnima Avatar - Interactive Video Generation Engine. Generates 16:9 videos with dynamic character sprites (Shutiao), synced audio (Fish Audio), and text overlay.
openclaw skills install animaGenerates high-quality interactive videos where Shutiao speaks the text with appropriate expressions, gestures, and voice.
src/director.js: The core engine. Generates frames (sharp + SVG), audio (Fish Audio), and video (FFmpeg).src/send_video_pro.js: Delivery script. Handles transcoding, duration calculation, and Feishu upload.src/batch_generator.js: Batch sprite generator. Uses Gemini image generation to produce sprite variants.assets/sprites/: The sprite library (1920x1080 PNG files).assets/production_plan.csv: The asset registry (25 sprites).assets/manifest.json: Sprite metadata for reference.output/: Generated videos.ClawHub only distributes text files. The sprite PNG images are not included in the published package.
After installing, follow the steps below in order to prepare your sprites before first use.
All image generation steps use Gemini API (Nano Banana) as the AI image generator. It works by "reference image + text prompt" — you give it an existing image and a text description of what to change, and it returns a new image with the changes applied. This is how both the base sprite (character + background fusion) and all expression variants are created.
You need a standalone character illustration (transparent background PNG recommended).
Save it somewhere accessible (e.g. avatars/my_character.png).
You need a background scene for the character to stand in.
Save it at: assets/backgrounds/ (e.g. assets/backgrounds/cherry_blossom_bg.png).
This step uses Gemini (Nano Banana) image generation to merge your character onto the background. The AI sees both images and creates a natural-looking composite — this is NOT a simple overlay/paste, but an AI-generated fusion that handles lighting, shadows, and blending.
How to do it:
Method A: Use Gemini directly (recommended) Use any Gemini-compatible image generation tool (like Nano Banana, Google AI Studio, or the Gemini API) with:
Save the output as: assets/sprites/shutiao_base.png
Method B: Use the built-in compose script (simple overlay)
If you just want a quick mechanical overlay (no AI blending), src/compose_base.js can paste your character onto the background using sharp:
src/compose_base.js — update BG_PATH and AVATAR_PATH to point to your files.node src/compose_base.jsassets/sprites/shutiao_base.pngNote: Method B is a plain image composite. Method A (Gemini) produces much better results because it handles lighting and integration naturally.
Now that you have a base sprite, plan what expression/pose variants you want.
Open assets/production_plan.csv and customize it:
ID,Emotion,Variant,Description,Filename,Prompt,Status
001,Base,v1,Standard,shutiao_base.png,gentle smile looking at viewer,Done
003,Happy,v1,Smile,shutiao_happy.png,big happy smile eyes closed,Pending
007,Angry,v1,Pout,shutiao_angry.png,angry face pouting,Pending
...
Column meanings:
shutiao_<emotion>_<variant>.png format.Pending = will be generated. Done = already exists, skip.The default CSV has 25 entries. You can add, remove, or modify rows freely.
This step uses Gemini (Nano Banana) image generation again. For each Pending row, the batch generator sends your base sprite + the prompt to Gemini, asking: "Same image, change facial expression to [prompt]. Keep clothes and background exactly same."
skills/anima/.env:GEMINI_API_KEY=your_key_here
Make sure assets/sprites/shutiao_base.png (or shutiao_base_1k.png) exists from Step 3.
Run the batch generator:
node skills/anima/src/batch_generator.js
What happens:
production_plan.csvStatus=Pendingassets/sprites/Status=DoneCheck that assets/sprites/ now has a PNG file for every row in production_plan.csv:
ls assets/sprites/*.png | wc -l
Then do a quick test run:
node skills/anima/run.js --preview --script '[{"text":"Test","emotion":"Happy"}]'
Check the generated frame at temp/frame_0.png — you should see your character with the text overlay.
If a sprite is missing at runtime, the director will fall back to a white background with a warning in the console.
brew install ffmpegsudo apt install ffmpegInstall inside the skill folder:
cd skills/anima
npm install
The only native dependency is sharp, which ships prebuilt binaries for all major platforms via N-API. It does not need recompilation when Node versions change — install once, run everywhere.
This skill depends on two external services. You need to provide your own API keys.
src/director.js (the generateAudio() function).FISH_AUDIO_KEY — Your API key (starts with sk-... or a hex string).FISH_AUDIO_REF_ID — The voice model reference ID. You can use Fish Audio's default models or clone your own voice.src/batch_generator.js (only needed if you want to create new sprite variants).batch_generator.js calls the Gemini API directly via curl.GEMINI_API_KEYsrc/send_video_pro.js.FEISHU_APP_ID — Your Feishu app ID.FEISHU_APP_SECRET — Your Feishu app secret.--preview mode.Create a .env file inside the skill folder (skills/anima/.env):
# Fish Audio (Required for TTS)
FISH_AUDIO_KEY=your_key_here
FISH_AUDIO_REF_ID=your_model_ref_id_here
# Gemini (Optional, for sprite generation)
GEMINI_API_KEY=your_key_here
# Feishu/Lark (Optional, for delivery)
FEISHU_APP_ID=cli_...
FEISHU_APP_SECRET=...
Important: The .env file is loaded from the skill folder first (least-privilege). Never commit .env files — the .clawignore already excludes it.
# Basic usage (Demo script)
node skills/anima/run.js --target "ou_..."
# With custom script (JSON string)
node skills/anima/run.js --target "ou_..." --script '[{"text":"Hello World","emotion":"Happy"}]'
# With custom script (File)
node skills/anima/run.js --target "ou_..." --script "path/to/script.json"
# Preview only (No upload)
node skills/anima/run.js --script '[{"text":"Test","emotion":"Happy"}]' --preview
node skills/anima/run.js --target "<open_id>" --script '[{"text":"Hello","emotion":"Happy"}]'
Each scene in the script is a JSON object:
[
{ "text": "Hello boss!", "emotion": "Happy" },
{ "text": "Let me think...", "emotion": "Think" },
{ "text": "I got it!", "emotion": "Action" }
]
Available emotions: Base, Happy, Angry, Shy, Think, Sad, Action.
To use a different TTS provider (e.g., OpenAI, ElevenLabs):
src/director.js.generateAudio(text, filename) function.{ path: "/path/to/audio.wav", duration: 1.5 } (duration in seconds).To add new expressions or poses after the initial setup:
assets/production_plan.csv with Status=Pending.angry expression, arms crossed, looking away).node src/batch_generator.js — it will only process Pending rows.loadSprites().See ASSETS_PLAN.md for the full production matrix and design philosophy.
send_video_pro.js calculates duration in ms and passes it to both upload and message payload.ffmpeg transcoding logs and verify source frame images in temp/frame_*.png.sudo apt install fonts-noto-cjk).FISH_AUDIO_KEY is missing, the skill falls back to macOS say command (English only).