Anima

v3.3.2

Anima Avatar - Interactive Video Generation Engine. Generates 16:9 videos with dynamic character sprites (Shutiao), synced audio (Fish Audio), and text overlay.

⭐ 0· 1.5k·2 current·2 all-time

by@hmyaoyuan

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

high confidence

ℹ

Purpose & Capability

The code and SKILL.md align with an interactive video generator that uses Gemini (images), Fish Audio (TTS), FFmpeg (composition) and Feishu (delivery). However the registry metadata declares no required environment variables or binaries, while the implementation expects multiple external services and command-line tools. That mismatch is unexpected and disproportionate to the metadata provided.

Instruction Scope

SKILL.md explicitly instructs generation using Gemini and Fish Audio and to set GEMINI_API_KEY in the skill .env; code performs API calls and will upload generated videos to Feishu when credentials are present. The code reads/writes local files (assets/, temp/, output/) and updates production_plan.csv. It will transmit your base avatar/background images to Gemini (base64 upload) and will upload final videos to Feishu — both are privacy-sensitive actions. The runtime also uses shell commands (curl, ffmpeg, ffprobe, say) via execSync; these are not listed in the registry requirements.

ℹ

Install Mechanism

No install spec is provided (instruction-only distribution), which reduces automatic install risk. However there are shipped Node files and a package.json (sharp, dotenv). Using sharp may require native build steps. The code invokes external binaries via shell (ffmpeg/ffprobe/curl/say/sleep) which are not declared. There is no remote download/install of arbitrary archives (good), but running the included scripts will execute network calls and binaries on your machine.

Credentials

Registry metadata lists no required env vars, but the code requires/uses GEMINI_API_KEY, FISH_AUDIO_KEY, FISH_AUDIO_REF_ID, FEISHU_APP_ID, and FEISHU_APP_SECRET (and expects them in skills/anima/.env). That is a clear inconsistency. These credentials grant access to external services and (in Feishu's case) allow sending messages as the app — so they are high-value and must be justified and protected; the skill's metadata should have declared them but did not.

✓

Persistence & Privilege

The skill does not request always: true and does not modify other skills or system-wide agent settings. It reads/writes only within its own skill folder (temp/, output/, assets/) and updates its production_plan.csv when generating sprites. However, given it can use Feishu credentials to send media, it effectively can transmit data externally when run with credentials — so grant runtime privileges cautiously.

What to consider before installing

Key things to consider before installing/running: - Metadata mismatch: The registry claims no environment variables or binaries required, but the code needs GEMINI_API_KEY, FISH_AUDIO_KEY, FISH_AUDIO_REF_ID, FEISHU_APP_ID, FEISHU_APP_SECRET and command-line tools (ffmpeg, ffprobe, curl, sleep; macOS fallback 'say'). Treat this as a red flag — the publisher likely omitted required permissions. - Sensitive uploads: The batch generator sends your base avatar/background images (base64) to Gemini (https://generativelanguage.googleapis.com). The send script uploads videos to Feishu using app credentials. If you supply real service credentials, personal or proprietary images and generated videos will be transmitted off your machine. - Principle of least privilege: Only supply test credentials (not production) and keep the .env inside the skill folder as the code expects. Do not put highly privileged tokens in this skill unless you fully trust it. - Run in isolated environment: If you decide to test, run npm install inside a disposable container/VM, ensure ffmpeg and curl are installed in that environment, and avoid using your real work accounts. Inspect the .env file before running and verify endpoint URLs in code. - Code review items: Confirm the GEMINI and Fish Audio API usage meets your privacy expectations (what is sent), and verify Feishu upload logic if you don't want automatic outbound messages. Also be aware that execSync calls run external commands and may expose secrets if logged elsewhere. - If you want to proceed safely: 1) audit and edit the code to suit your needs, 2) provide only restricted/test API keys, 3) validate the asset paths and ensure sprites/backgrounds are local and acceptable, and 4) run in an isolated environment. Given the clear mismatches between metadata and implementation (missing declared envs, undeclared binary dependencies, external uploads), treat this skill as suspicious until you confirm the credentials and external endpoints are acceptable and the code is safe for your data.

Like a lobster shell, security has layers — review code before you run it.

latestvk97c35z7v91h7zae9b1efdvys5811wyj

1.5kdownloads

0stars

10versions

Updated 21h ago

v3.3.2

MIT-0

Anima Avatar (Project Anima)

Generates high-quality interactive videos where Shutiao speaks the text with appropriate expressions, gestures, and voice.

Capabilities

True Voice: Uses Fish Audio API for realistic speech synthesis.
Dynamic Sprites: Auto-selects from a library of 30+ sprites (Happy, Angry, Shy, Think, Action) based on emotion tags.
Smart Director: Handles parallel rendering, audio-sync, and video composition (FFmpeg).
Pro Delivery: Uploads as native stream to Feishu for direct playback (with correct duration).

Structure

src/director.js: The core engine. Generates frames (sharp + SVG), audio (Fish Audio), and video (FFmpeg).
src/send_video_pro.js: Delivery script. Handles transcoding, duration calculation, and Feishu upload.
src/batch_generator.js: Batch sprite generator. Uses Gemini image generation to produce sprite variants.
assets/sprites/: The sprite library (1920x1080 PNG files).
assets/production_plan.csv: The asset registry (25 sprites).
assets/manifest.json: Sprite metadata for reference.
output/: Generated videos.

IMPORTANT: Sprites Not Included

ClawHub only distributes text files. The sprite PNG images are not included in the published package.

After installing, follow the steps below in order to prepare your sprites before first use.

All image generation steps use Gemini API (Nano Banana) as the AI image generator. It works by "reference image + text prompt" — you give it an existing image and a text description of what to change, and it returns a new image with the changes applied. This is how both the base sprite (character + background fusion) and all expression variants are created.

Step 1: Prepare your character image

You need a standalone character illustration (transparent background PNG recommended).

This is your character's "identity" — it defines the look for all sprites.
Resolution: at least 1920x1080. Full-body is best.
Example: a full-body anime character PNG with transparent background.

Save it somewhere accessible (e.g. avatars/my_character.png).

Step 2: Prepare your background image

You need a background scene for the character to stand in.

This is the environment that appears behind the character in every video frame.
Resolution: at least 1920x1080.
Example: a cherry blossom garden, a classroom, a city street.

Save it at: assets/backgrounds/ (e.g. assets/backgrounds/cherry_blossom_bg.png).

Step 3: Fuse character + background into base sprite

This step uses Gemini (Nano Banana) image generation to merge your character onto the background. The AI sees both images and creates a natural-looking composite — this is NOT a simple overlay/paste, but an AI-generated fusion that handles lighting, shadows, and blending.

How to do it:

Method A: Use Gemini directly (recommended) Use any Gemini-compatible image generation tool (like Nano Banana, Google AI Studio, or the Gemini API) with:

Input image: Your background image
Reference/overlay: Your character image
Prompt: e.g. "Place this character naturally in the center of this background scene, full body visible, gentle smile"

Save the output as: assets/sprites/shutiao_base.png

Method B: Use the built-in compose script (simple overlay) If you just want a quick mechanical overlay (no AI blending), src/compose_base.js can paste your character onto the background using sharp:

Edit src/compose_base.js — update BG_PATH and AVATAR_PATH to point to your files.
Run: node src/compose_base.js
Output: assets/sprites/shutiao_base.png

Note: Method B is a plain image composite. Method A (Gemini) produces much better results because it handles lighting and integration naturally.

Step 4: Plan your sprite variants

Now that you have a base sprite, plan what expression/pose variants you want.

Open assets/production_plan.csv and customize it:

ID,Emotion,Variant,Description,Filename,Prompt,Status
001,Base,v1,Standard,shutiao_base.png,gentle smile looking at viewer,Done
003,Happy,v1,Smile,shutiao_happy.png,big happy smile eyes closed,Pending
007,Angry,v1,Pout,shutiao_angry.png,angry face pouting,Pending
...

Column meanings:

Emotion: Category used by the video director to pick sprites (Happy, Angry, Shy, Think, Sad, Action, Base).
Filename: Output filename. Must follow shutiao_<emotion>_<variant>.png format.
Prompt: Describes how this variant differs from the base. The generator sends the base image + this prompt to Gemini, asking it to change only the expression/pose while keeping everything else the same.
Status: Pending = will be generated. Done = already exists, skip.

The default CSV has 25 entries. You can add, remove, or modify rows freely.

Step 5: Generate sprite variants

This step uses Gemini (Nano Banana) image generation again. For each Pending row, the batch generator sends your base sprite + the prompt to Gemini, asking: "Same image, change facial expression to [prompt]. Keep clothes and background exactly same."

Set your Gemini API key in skills/anima/.env:

GEMINI_API_KEY=your_key_here

Make sure assets/sprites/shutiao_base.png (or shutiao_base_1k.png) exists from Step 3.
Run the batch generator:

node skills/anima/src/batch_generator.js

What happens:

Reads production_plan.csv
Finds all rows with Status=Pending
For each: sends the base sprite + prompt to Gemini API
Saves the generated image as a PNG in assets/sprites/
Updates the CSV row to Status=Done
Waits 10 seconds between generations (API rate limit cooldown)

Step 6: Verify

Check that assets/sprites/ now has a PNG file for every row in production_plan.csv:

ls assets/sprites/*.png | wc -l

Then do a quick test run:

node skills/anima/run.js --preview --script '[{"text":"Test","emotion":"Happy"}]'

Check the generated frame at temp/frame_0.png — you should see your character with the text overlay.

If a sprite is missing at runtime, the director will fall back to a white background with a warning in the console.

Setup & Requirements

1. System Dependencies

ffmpeg (required for video processing):
- macOS: brew install ffmpeg
- Linux: sudo apt install ffmpeg
- Windows: Download/Install FFmpeg and add to PATH.

2. Node Dependencies

Install inside the skill folder:

cd skills/anima
npm install

The only native dependency is sharp, which ships prebuilt binaries for all major platforms via N-API. It does not need recompilation when Node versions change — install once, run everywhere.

3. External Services (API Keys Required)

This skill depends on two external services. You need to provide your own API keys.

Fish Audio (TTS - Text to Speech)

What: Generates realistic voice audio from text.
Used by: src/director.js (the generateAudio() function).
Get a key: https://fish.audio/dashboard/api
Env vars needed:
- FISH_AUDIO_KEY — Your API key (starts with sk-... or a hex string).
- FISH_AUDIO_REF_ID — The voice model reference ID. You can use Fish Audio's default models or clone your own voice.

Gemini API (Image Generation - Optional)

What: Generates sprite variants using Google Gemini image generation.
Used by: src/batch_generator.js (only needed if you want to create new sprite variants).
Self-contained: No external skills needed. batch_generator.js calls the Gemini API directly via curl.
Get a key: https://aistudio.google.com/apikey
Env var needed: GEMINI_API_KEY
Not needed for normal video generation — only for creating new character sprites.

Feishu / Lark (Delivery - Optional)

What: Uploads videos to Feishu as native media messages.
Used by: src/send_video_pro.js.
Env vars needed:
- FEISHU_APP_ID — Your Feishu app ID.
- FEISHU_APP_SECRET — Your Feishu app secret.
Not needed if you only use --preview mode.

4. Environment Configuration

Create a .env file inside the skill folder (skills/anima/.env):

# Fish Audio (Required for TTS)
FISH_AUDIO_KEY=your_key_here
FISH_AUDIO_REF_ID=your_model_ref_id_here

# Gemini (Optional, for sprite generation)
GEMINI_API_KEY=your_key_here

# Feishu/Lark (Optional, for delivery)
FEISHU_APP_ID=cli_...
FEISHU_APP_SECRET=...

Important: The .env file is loaded from the skill folder first (least-privilege). Never commit .env files — the .clawignore already excludes it.

Usage

Generate & Send

# Basic usage (Demo script)
node skills/anima/run.js --target "ou_..."

# With custom script (JSON string)
node skills/anima/run.js --target "ou_..." --script '[{"text":"Hello World","emotion":"Happy"}]'

# With custom script (File)
node skills/anima/run.js --target "ou_..." --script "path/to/script.json"

# Preview only (No upload)
node skills/anima/run.js --script '[{"text":"Test","emotion":"Happy"}]' --preview

One-Liner (for agent use)

node skills/anima/run.js --target "<open_id>" --script '[{"text":"Hello","emotion":"Happy"}]'

Script Format

Each scene in the script is a JSON object:

[
  { "text": "Hello boss!", "emotion": "Happy" },
  { "text": "Let me think...", "emotion": "Think" },
  { "text": "I got it!", "emotion": "Action" }
]

Available emotions: Base, Happy, Angry, Shy, Think, Sad, Action.

Extension: Custom TTS

To use a different TTS provider (e.g., OpenAI, ElevenLabs):

Open src/director.js.
Locate the generateAudio(text, filename) function.
Replace the Fish Audio API call with your provider's logic.
Contract: The function must return: { path: "/path/to/audio.wav", duration: 1.5 } (duration in seconds).

Advanced: Adding More Sprite Variants

To add new expressions or poses after the initial setup:

Add a new row to assets/production_plan.csv with Status=Pending.
Write a clear prompt describing the change from the base (e.g. angry expression, arms crossed, looking away).
Run node src/batch_generator.js — it will only process Pending rows.
The new sprite will auto-register in the director's emotion pool via loadSprites().

See ASSETS_PLAN.md for the full production matrix and design philosophy.

Troubleshooting

Duration 00:00: Ensure send_video_pro.js calculates duration in ms and passes it to both upload and message payload.
Fish Audio 400: Check that your Ref ID matches the API Key owner's model.
Video Black: Check ffmpeg transcoding logs and verify source frame images in temp/frame_*.png.
SVG text not rendering: Ensure the system has CJK fonts installed (macOS has them by default; on Linux: sudo apt install fonts-noto-cjk).
No audio fallback: If FISH_AUDIO_KEY is missing, the skill falls back to macOS say command (English only).

Comments

Loading comments...