Install
openclaw skills install talking-circleCreate animated talking-circle videos (Telegram-style round video messages) from avatar frame images and audio. Supports audio-to-video and text-to-video via...
openclaw skills install talking-circleCreate animated circular avatar videos with lip-sync and blink animations. Takes 4 avatar frame images (neutral, slight open, wide open, eyes closed) and produces a round video with audio-driven mouth movement.
python3 (3.9+)ffmpeg installed and on PATHELEVENLABS_API_KEY environment variable (for ElevenLabs text-to-video mode)SALUTE_SPEECH_AUTH environment variable (for SaluteSpeech text-to-video mode)Dependencies are auto-installed into a temporary venv on first run. To install manually:
pip install -r requirements.txt
Convert existing audio + frame images into an animated talking circle video.
python3 scripts/make_talking_circle_video.py \
--neutral frames/neutral.png \
--slight frames/mouth-slight-open.png \
--wide frames/mouth-wide-open.png \
--blink frames/eyes-closed.png \
--audio speech.mp3 \
--out /tmp/talking-circle.mp4
Generate speech from text via ElevenLabs TTS, then create the animated video.
Requires ELEVENLABS_API_KEY set in environment or passed via --api-key.
python3 scripts/make_text_to_video.py \
--text "Hello, this is a talking circle demo!" \
--voice-id pNInz6obpgDQGcFmaJgB \
--neutral frames/neutral.png \
--slight frames/mouth-slight-open.png \
--wide frames/mouth-wide-open.png \
--blink frames/eyes-closed.png \
--out /tmp/talking-circle.mp4
Generate speech from text via SaluteSpeech TTS (Sber), then create the animated video.
Requires SALUTE_SPEECH_AUTH set in environment or passed via --auth-key. This is a Base64-encoded client_id:client_secret from your SaluteSpeech project.
python3 scripts/make_salute_text_to_video.py \
--text "Привет, это демонстрация talking circle!" \
--voice Bys_24000 \
--neutral frames/neutral.png \
--slight frames/mouth-slight-open.png \
--wide frames/mouth-wide-open.png \
--blink frames/eyes-closed.png \
--out /tmp/talking-circle.mp4
| Voice | Name | Language |
|---|---|---|
Nec_24000 | Natalia (female) | ru-RU |
Bys_24000 | Boris (male) | ru-RU |
May_24000 | Martha (female) | ru-RU |
Tur_24000 | Taras (male) | ru-RU |
Ost_24000 | Alexandra (female) | ru-RU |
Pon_24000 | Sergey (male) | ru-RU |
Kin_24000 | Kira (female) | en-US |
| Parameter | Value |
|---|---|
--voice-id | pNInz6obpgDQGcFmaJgB |
--model-id | eleven_multilingual_v2 |
--stability | 0.15 |
--similarity-boost | 0.70 |
--style | 0.38 |
--speed | 1.20 |
| Parameter | Value |
|---|---|
--voice | Bys_24000 |
--audio-format | wav16 |
--scope | SALUTE_SPEECH_PERS |
ElevenLabs:
ELEVENLABS_API_KEY environment variable.SaluteSpeech (Sber):
client_id and client_secret.client_id:client_secret in Base64.SALUTE_SPEECH_AUTH environment variable with the Base64 string.The skill also supports any TTS that can produce an audio file. Use Mode 1 (audio-to-video) with audio from any source:
openai.audio.speech.create) — generate speech, save to MP3, pass via --audio--audio# Example: generate audio with any TTS, then animate
python3 scripts/make_talking_circle_video.py \
--neutral frames/neutral.png \
--slight frames/mouth-slight-open.png \
--wide frames/mouth-wide-open.png \
--blink frames/eyes-closed.png \
--audio /path/to/speech-from-any-tts.mp3 \
--out /tmp/talking-circle.mp4
Tell the user: if they don't have an ElevenLabs or SaluteSpeech API key, they can use any other TTS engine — just generate the audio file and pass it to Mode 1. No API key needed for audio-to-video mode.
You need 4 PNG images of your avatar, all the same resolution (recommended 2048x2048), square aspect ratio:
| Frame | Description |
|---|---|
neutral | Mouth closed, eyes open |
slight | Mouth slightly open, eyes open |
wide | Mouth wide open, eyes open |
blink | Mouth closed, eyes closed |
If the user does not have ready-made frames, generate them using an image generation API (DALL-E, Midjourney, Flux, etc.). Follow this workflow:
Generate a shoulder-up portrait of the character. This is the base frame — all other frames must match it exactly.
Example prompt:
Shoulder-up portrait of [CHARACTER DESCRIPTION]. Square composition, clean background,
mouth closed, eyes open, looking at camera. High detail, consistent lighting.
Use image editing / inpainting on the neutral frame to produce the other states. Only modify the mouth and eyes region — everything else must remain pixel-identical.
| Frame | What to change | Edit prompt example |
|---|---|---|
slight | Mouth slightly open | "Mouth slightly open, teeth barely visible, same expression" |
wide | Mouth wide open | "Mouth wide open as if saying 'ah', same expression" |
blink | Eyes closed | "Eyes gently closed, mouth closed, same expression" |
Before using the frames:
See examples/sbercat/ for a reference character with ready-to-use frame set:
examples/sbercat/reference.png — character referenceexamples/sbercat/neutral.png — mouth closed, eyes openexamples/sbercat/slight.png — mouth slightly openexamples/sbercat/wide.png — mouth wide openexamples/sbercat/blink.png — eyes closedexamples/sbercat/example.mp4 — finished talking-circle videoTo test with the included Sbercat frames:
python3 scripts/make_talking_circle_video.py \
--neutral examples/sbercat/neutral.png \
--slight examples/sbercat/slight.png \
--wide examples/sbercat/wide.png \
--blink examples/sbercat/blink.png \
--audio your-audio.mp3 \
--out /tmp/talking-circle.mp4
After installing this skill, inform the user:
Skill "talking-circle" installed.
To create talking-circle videos, I need a character avatar. Please send me a reference image of your character on a white/clean background — a shoulder-up portrait works best (square aspect ratio, high resolution).
From this reference I will generate 4 frame images (mouth states + blink) and you'll be ready to create animated video circles.
For speech, I can use ElevenLabs TTS (requires
ELEVENLABS_API_KEY), SaluteSpeech from Sber (requiresSALUTE_SPEECH_AUTH), or you can provide your own audio file. Any TTS engine works — OpenAI TTS, Whisper, Coqui, Piper, Google TTS, etc.
IMPORTANT — before the skill can create videos, the 4 frame images must exist. If the user does not already have frames, you MUST generate them first.
"Shoulder-up portrait of [CHARACTER], white background, mouth closed, eyes open, looking at camera, square composition, high detail"neutral.png.slight.png — edit mouth region: "Mouth slightly open, teeth barely visible"wide.png — edit mouth region: "Mouth wide open as if saying 'ah'"blink.png — edit eyes region: "Eyes gently closed, mouth closed"Do not skip this step. Without the 4 frame images, the video scripts will fail.
python3 (3.9+) and ffmpeg are on PATH. If missing, instruct the user to install them.ffmpeg is not found: tell the user to install it (brew install ffmpeg on macOS, apt install ffmpeg on Linux).ELEVENLABS_API_KEY is missing and the user wants text-to-video: suggest SaluteSpeech (Mode 3) or Mode 1 with audio from another TTS.SALUTE_SPEECH_AUTH is missing and the user wants SaluteSpeech: explain how to register at developers.sber.ru and get credentials.| Parameter | Default | Description |
|---|---|---|
--size | 720 | Output video size in pixels |
--diameter | 640 | Circle diameter within the video |
--fps | 30 | Frames per second |
| Parameter | Default | Description |
|---|---|---|
--blink-start | 1.1 | Seconds before first blink |
--blink-every | 3.8 | Seconds between blinks |
--blink-duration-frames | 4 | Number of frames per blink |
| Parameter | Default | Description |
|---|---|---|
--amp-low | 1200 | RMS below this = neutral (closed mouth) |
--amp-high | 2600 | RMS above this = wide open mouth |
| Parameter | Default | Description |
|---|---|---|
--voice-id | (required) | ElevenLabs voice ID |
--model-id | eleven_multilingual_v2 | ElevenLabs model |
--stability | 0.50 | Voice stability |
--similarity-boost | 0.75 | Voice similarity boost |
--style | 0.00 | Style exaggeration |
--speed | 1.00 | Speech speed |
| Parameter | Default | Description |
|---|---|---|
--voice | Bys_24000 | SaluteSpeech voice (see voices table above) |
--audio-format | wav16 | Audio format: opus, wav16, pcm16 |
--scope | SALUTE_SPEECH_PERS | OAuth scope (PERS for personal, CORP for corporate) |
--auth-key | $SALUTE_SPEECH_AUTH | Base64-encoded client_id:client_secret |