Volcengine Digital Human Video Generator

v1.0.4

火山引擎数字人视频生成技能。当用户发送照片并提供对白或配音文案，要求生成数字人口播视频时触发。全自动完成：图片上传、形象创建、TTS配音（自动性别检测、多音色匹配）、视频合成、最后发回给用户。触发词包括数字人、视频合成、口播视频、数字人视频。

⭐ 0· 104·0 current·0 all-time

by@xiaoxiaole2025

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for xiaoxiaole2025/volc-digital-human.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Volcengine Digital Human Video Generator" (xiaoxiaole2025/volc-digital-human) from ClawHub.
Skill page: https://clawhub.ai/xiaoxiaole2025/volc-digital-human
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install volc-digital-human

ClawHub CLI

Package manager switcher

npx clawhub@latest install volc-digital-human

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

high confidence

ℹ

Purpose & Capability

The name/description (Volcengine digital human video generator) match the code and instructions: image upload → create avatar → TTS → synthesize video. Requiring Volcengine AK/SK, TTS (edge-tts) and ffmpeg is coherent. However the registry metadata at the top claimed no required env vars/credentials while SKILL.md and the script explicitly require VOLC_AK/VOLC_SK and even include a config.json with AK/SK — that metadata mismatch is unexpected and should be explained by the author.

Instruction Scope

The SKILL.md and script instruct the agent to read images from /root/.openclaw/media/inbound and to upload user images/audio/video to public file hosts (catbox.moe, 0x0.st; references also mention uguu.se). Reading inbound media and calling external APIs is necessary for the task, but automatic public hosting of user-supplied images/audio is a significant privacy risk. The SKILL.md warns about this, but the automation will still expose content publicly during processing — verify users understand this before use.

ℹ

Install Mechanism

No install spec (instruction-only), so nothing is written by an installer. The script has heavy runtime dependencies (opencv, deepface/retinaface, numpy, edge-tts, ffmpeg) and deepface may download models at runtime. Lack of an install spec means dependency installation/behavior (and model downloads) will happen outside the package and should be managed explicitly.

Credentials

Requesting VOLC_AK and VOLC_SK is appropriate for calling Volcengine. However the included config.json in the package contains ak/sk values (hard-coded credentials). Shipping credentials in a skill package is a serious red flag: it may be a leaked/shared key or intentionally embedded account credentials. The script will read a config.json in its directory if env vars are not set, causing accidental use of those embedded credentials. This is disproportionate and may grant the package author (or whoever controls that account) access to usage and uploaded content.

ℹ

Persistence & Privilege

always:false and normal autonomous invocation are fine. The skill reads from the agent's inbound media directory and writes temporary files under /tmp and its own workspace; it does not modify other skills or system-wide configs. Still, the combination of autonomous invocation plus public uploads means the agent could automatically expose user media when invoked — be cautious about enabling it for unattended runs.

What to consider before installing

Key things to consider before installing or using this skill: - Do not upload sensitive or private images/audio. The skill uploads user-provided media to public file hosts (catbox.moe, 0x0.st / references mention uguu.se) so anyone with the URL can access them during processing. - The package contains a config.json with hard-coded AK/SK credentials. Treat this as insecure: either remove that file, replace the credentials with your own, or set VOLC_AK/VOLC_SK in environment variables. If you cannot verify those keys' ownership, do not rely on them — they may be leaked or abused. - Consider rotating any Volcengine keys you plan to use for this skill and use a minimal-permission RAM user for the Digital Human service only. - The script can download ML models at runtime (deepface/retinaface) and calls external services; run it in an isolated environment (container) if you need to limit network/file-system exposure. - Verify and/or pin dependency installation (edge-tts, ffmpeg, OpenCV, deepface) in a controlled environment; the package does not provide an install step. If you need this capability but are uncomfortable with public uploads or embedded credentials, ask the skill author to remove the bundled config.json, provide clear metadata declaring required env vars, and offer an option to use private storage (your own S3/minio) instead of public file hosts.

Like a lobster shell, security has layers — review code before you run it.

latestvk979de72dkqe1vsbw4jdbdndnh83kqp4

104downloads

0stars

5versions

Updated 1mo ago

v1.0.4

MIT-0

Volcengine Digital Human Video Generator

⚠️ First-Time Setup Required

This skill requires Volcengine Access Key (AK) and Secret Key (SK).

Get AK/SK

Register Volcengine account: https://console.volcengine.com/
Enable "Digital Human Video Generation" service
Create Access Key: https://console.volcengine.com/iam/keymanage/

Configuration (choose one)

Option 1: Config file (recommended)

Create config.json (in the skill directory):

{
  "ak": "your_access_key_here",
  "sk": "your_secret_key_here"
}

Option 2: Environment variables

export VOLC_AK="your_access_key_here"
export VOLC_SK="your_secret_key_here"

⚠️ Security: Never hardcode AK/SK in scripts or commit to public repos!

⚠️ Privacy Notice

This skill uploads images and generated audio/video to third-party file hosts (catbox.moe, 0x0.st) to create publicly accessible URLs required by the Volcengine API.

Do not use with sensitive/private images you don't want uploaded to public hosts
User images and generated content will be publicly accessible during the video generation process
Download and use videos promptly; URLs may expire

Core Flow

Get image: Fetch from /root/.openclaw/media/inbound/
Gender detection: OpenCV Haar cascade for eyes/nose features
Upload image: Upload to catbox.moe for public URL
Create avatar: Call realman_avatar_picture_create_role API
TTS audio: Auto-match voice by gender + edge-tts → upload to catbox.moe
Video synthesis: Call realman_avatar_picture_v2 API, poll for result
Download video: Save locally, generate thumbnail preview
Deliver: Send thumbnail + video via message tool

Quick Run

cd /root/.openclaw/workspace-employee-xiaozhua
python3 skills/volc-digital-human/scripts/volc_digital_human.py "$image_path" "$dialog_text" [gender]

Parameters:

image_path: Image path, None=auto-fetch latest image
dialog_text: Script/dialog content
gender: Optional, male|female|None (auto-detect)

Voice Matching Rules

Detected Gender	Human Voice	Cartoon Voice
female	`zh-CN-XiaoxiaoNeural` (natural female)	`zh-CN-XiaoyiNeural` (lively female)
male	`zh-CN-YunxiNeural` (sunny male)	`zh-CN-YunxiaNeural` (cute male)
unknown	`zh-CN-XiaoxiaoNeural` (default female)	`zh-CN-XiaoyiNeural`

Manual override:

Say "male"/"男生"/"男的" → force male voice
Say "female"/"女生"/"女的" → force female voice
Say "cartoon"/"卡通角色"/"动物" → use cartoon voice

Detailed API Reference

See references/volc_api.md

Key Parameters

Parameter	Description
`image_url`	Public URL (required), uploaded to file host
`audio_url`	Public URL for audio MP3 (required)
`resource_id`	Avatar ID returned after creation, can be reused
`req_key`	create=`realman_avatar_picture_create_role`, synthesize=`realman_avatar_picture_v2`

Notes

Image tips: Closed-mouth photos work better; WeChat thumbnails also work
Gender detection: Heuristic based on Haar eye/nose features, not 100% accurate; confirm with user if needed
Cartoon/animal: Use lively female voice zh-CN-XiaoyiNeural as default
Video URL expiry: ~1 hour, download promptly
Generation time: Usually 30 sec ~ 3 min
Rate limit: Volcengine has request frequency limits; wait 1-5 min if 50430 error
TTS: edge-tts (Microsoft free), no API key needed

Error Handling

Error Code	Meaning	Solution
`50430`	Rate limit	Wait 1-5 min, retry
`50207`	Image decode error	Use jpg/png format
`401`	AK/SK error	Check credentials

Comments

Loading comments...