Volcengine Digital Human Video Generator
⚠️ First-Time Setup Required
This skill requires Volcengine Access Key (AK) and Secret Key (SK).
Get AK/SK
- Register Volcengine account: https://console.volcengine.com/
- Enable "Digital Human Video Generation" service
- Create Access Key: https://console.volcengine.com/iam/keymanage/
Configuration (choose one)
Option 1: Config file (recommended)
Create config.json (in the skill directory):
{
"ak": "your_access_key_here",
"sk": "your_secret_key_here"
}
Option 2: Environment variables
export VOLC_AK="your_access_key_here"
export VOLC_SK="your_secret_key_here"
⚠️ Security: Never hardcode AK/SK in scripts or commit to public repos!
⚠️ Privacy Notice
This skill uploads images and generated audio/video to third-party file hosts (catbox.moe, 0x0.st) to create publicly accessible URLs required by the Volcengine API.
- Do not use with sensitive/private images you don't want uploaded to public hosts
- User images and generated content will be publicly accessible during the video generation process
- Download and use videos promptly; URLs may expire
Core Flow
- Get image: Fetch from
/root/.openclaw/media/inbound/
- Gender detection: OpenCV Haar cascade for eyes/nose features
- Upload image: Upload to catbox.moe for public URL
- Create avatar: Call
realman_avatar_picture_create_role API
- TTS audio: Auto-match voice by gender + edge-tts → upload to catbox.moe
- Video synthesis: Call
realman_avatar_picture_v2 API, poll for result
- Download video: Save locally, generate thumbnail preview
- Deliver: Send thumbnail + video via
message tool
Quick Run
cd /root/.openclaw/workspace-employee-xiaozhua
python3 skills/volc-digital-human/scripts/volc_digital_human.py "$image_path" "$dialog_text" [gender]
Parameters:
image_path: Image path, None=auto-fetch latest image
dialog_text: Script/dialog content
gender: Optional, male|female|None (auto-detect)
Voice Matching Rules
| Detected Gender | Human Voice | Cartoon Voice |
|---|
| female | zh-CN-XiaoxiaoNeural (natural female) | zh-CN-XiaoyiNeural (lively female) |
| male | zh-CN-YunxiNeural (sunny male) | zh-CN-YunxiaNeural (cute male) |
| unknown | zh-CN-XiaoxiaoNeural (default female) | zh-CN-XiaoyiNeural |
Manual override:
- Say "male"/"男生"/"男的" → force male voice
- Say "female"/"女生"/"女的" → force female voice
- Say "cartoon"/"卡通角色"/"动物" → use cartoon voice
Detailed API Reference
See references/volc_api.md
Key Parameters
| Parameter | Description |
|---|
image_url | Public URL (required), uploaded to file host |
audio_url | Public URL for audio MP3 (required) |
resource_id | Avatar ID returned after creation, can be reused |
req_key | create=realman_avatar_picture_create_role, synthesize=realman_avatar_picture_v2 |
Notes
- Image tips: Closed-mouth photos work better; WeChat thumbnails also work
- Gender detection: Heuristic based on Haar eye/nose features, not 100% accurate; confirm with user if needed
- Cartoon/animal: Use lively female voice
zh-CN-XiaoyiNeural as default
- Video URL expiry: ~1 hour, download promptly
- Generation time: Usually 30 sec ~ 3 min
- Rate limit: Volcengine has request frequency limits; wait 1-5 min if 50430 error
- TTS: edge-tts (Microsoft free), no API key needed
Error Handling
| Error Code | Meaning | Solution |
|---|
50430 | Rate limit | Wait 1-5 min, retry |
50207 | Image decode error | Use jpg/png format |
401 | AK/SK error | Check credentials |