Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Volcengine Digital Human Video Generator

v1.0.4

火山引擎数字人视频生成技能。当用户发送照片并提供对白或配音文案,要求生成数字人口播视频时触发。全自动完成:图片上传、形象创建、TTS配音(自动性别检测、多音色匹配)、视频合成、最后发回给用户。触发词包括数字人、视频合成、口播视频、数字人视频。

0· 104·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for xiaoxiaole2025/volc-digital-human.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Volcengine Digital Human Video Generator" (xiaoxiaole2025/volc-digital-human) from ClawHub.
Skill page: https://clawhub.ai/xiaoxiaole2025/volc-digital-human
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install volc-digital-human

ClawHub CLI

Package manager switcher

npx clawhub@latest install volc-digital-human
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
high confidence
Purpose & Capability
The name/description (Volcengine digital human video generator) match the code and instructions: image upload → create avatar → TTS → synthesize video. Requiring Volcengine AK/SK, TTS (edge-tts) and ffmpeg is coherent. However the registry metadata at the top claimed no required env vars/credentials while SKILL.md and the script explicitly require VOLC_AK/VOLC_SK and even include a config.json with AK/SK — that metadata mismatch is unexpected and should be explained by the author.
!
Instruction Scope
The SKILL.md and script instruct the agent to read images from /root/.openclaw/media/inbound and to upload user images/audio/video to public file hosts (catbox.moe, 0x0.st; references also mention uguu.se). Reading inbound media and calling external APIs is necessary for the task, but automatic public hosting of user-supplied images/audio is a significant privacy risk. The SKILL.md warns about this, but the automation will still expose content publicly during processing — verify users understand this before use.
Install Mechanism
No install spec (instruction-only), so nothing is written by an installer. The script has heavy runtime dependencies (opencv, deepface/retinaface, numpy, edge-tts, ffmpeg) and deepface may download models at runtime. Lack of an install spec means dependency installation/behavior (and model downloads) will happen outside the package and should be managed explicitly.
!
Credentials
Requesting VOLC_AK and VOLC_SK is appropriate for calling Volcengine. However the included config.json in the package contains ak/sk values (hard-coded credentials). Shipping credentials in a skill package is a serious red flag: it may be a leaked/shared key or intentionally embedded account credentials. The script will read a config.json in its directory if env vars are not set, causing accidental use of those embedded credentials. This is disproportionate and may grant the package author (or whoever controls that account) access to usage and uploaded content.
Persistence & Privilege
always:false and normal autonomous invocation are fine. The skill reads from the agent's inbound media directory and writes temporary files under /tmp and its own workspace; it does not modify other skills or system-wide configs. Still, the combination of autonomous invocation plus public uploads means the agent could automatically expose user media when invoked — be cautious about enabling it for unattended runs.
What to consider before installing
Key things to consider before installing or using this skill: - Do not upload sensitive or private images/audio. The skill uploads user-provided media to public file hosts (catbox.moe, 0x0.st / references mention uguu.se) so anyone with the URL can access them during processing. - The package contains a config.json with hard-coded AK/SK credentials. Treat this as insecure: either remove that file, replace the credentials with your own, or set VOLC_AK/VOLC_SK in environment variables. If you cannot verify those keys' ownership, do not rely on them — they may be leaked or abused. - Consider rotating any Volcengine keys you plan to use for this skill and use a minimal-permission RAM user for the Digital Human service only. - The script can download ML models at runtime (deepface/retinaface) and calls external services; run it in an isolated environment (container) if you need to limit network/file-system exposure. - Verify and/or pin dependency installation (edge-tts, ffmpeg, OpenCV, deepface) in a controlled environment; the package does not provide an install step. If you need this capability but are uncomfortable with public uploads or embedded credentials, ask the skill author to remove the bundled config.json, provide clear metadata declaring required env vars, and offer an option to use private storage (your own S3/minio) instead of public file hosts.

Like a lobster shell, security has layers — review code before you run it.

latestvk979de72dkqe1vsbw4jdbdndnh83kqp4
104downloads
0stars
5versions
Updated 1mo ago
v1.0.4
MIT-0

Volcengine Digital Human Video Generator

⚠️ First-Time Setup Required

This skill requires Volcengine Access Key (AK) and Secret Key (SK).

Get AK/SK

  1. Register Volcengine account: https://console.volcengine.com/
  2. Enable "Digital Human Video Generation" service
  3. Create Access Key: https://console.volcengine.com/iam/keymanage/

Configuration (choose one)

Option 1: Config file (recommended)

Create config.json (in the skill directory):

{
  "ak": "your_access_key_here",
  "sk": "your_secret_key_here"
}

Option 2: Environment variables

export VOLC_AK="your_access_key_here"
export VOLC_SK="your_secret_key_here"

⚠️ Security: Never hardcode AK/SK in scripts or commit to public repos!


⚠️ Privacy Notice

This skill uploads images and generated audio/video to third-party file hosts (catbox.moe, 0x0.st) to create publicly accessible URLs required by the Volcengine API.

  • Do not use with sensitive/private images you don't want uploaded to public hosts
  • User images and generated content will be publicly accessible during the video generation process
  • Download and use videos promptly; URLs may expire

Core Flow

  1. Get image: Fetch from /root/.openclaw/media/inbound/
  2. Gender detection: OpenCV Haar cascade for eyes/nose features
  3. Upload image: Upload to catbox.moe for public URL
  4. Create avatar: Call realman_avatar_picture_create_role API
  5. TTS audio: Auto-match voice by gender + edge-tts → upload to catbox.moe
  6. Video synthesis: Call realman_avatar_picture_v2 API, poll for result
  7. Download video: Save locally, generate thumbnail preview
  8. Deliver: Send thumbnail + video via message tool

Quick Run

cd /root/.openclaw/workspace-employee-xiaozhua
python3 skills/volc-digital-human/scripts/volc_digital_human.py "$image_path" "$dialog_text" [gender]

Parameters:

  • image_path: Image path, None=auto-fetch latest image
  • dialog_text: Script/dialog content
  • gender: Optional, male|female|None (auto-detect)

Voice Matching Rules

Detected GenderHuman VoiceCartoon Voice
femalezh-CN-XiaoxiaoNeural (natural female)zh-CN-XiaoyiNeural (lively female)
malezh-CN-YunxiNeural (sunny male)zh-CN-YunxiaNeural (cute male)
unknownzh-CN-XiaoxiaoNeural (default female)zh-CN-XiaoyiNeural

Manual override:

  • Say "male"/"男生"/"男的" → force male voice
  • Say "female"/"女生"/"女的" → force female voice
  • Say "cartoon"/"卡通角色"/"动物" → use cartoon voice

Detailed API Reference

See references/volc_api.md

Key Parameters

ParameterDescription
image_urlPublic URL (required), uploaded to file host
audio_urlPublic URL for audio MP3 (required)
resource_idAvatar ID returned after creation, can be reused
req_keycreate=realman_avatar_picture_create_role, synthesize=realman_avatar_picture_v2

Notes

  • Image tips: Closed-mouth photos work better; WeChat thumbnails also work
  • Gender detection: Heuristic based on Haar eye/nose features, not 100% accurate; confirm with user if needed
  • Cartoon/animal: Use lively female voice zh-CN-XiaoyiNeural as default
  • Video URL expiry: ~1 hour, download promptly
  • Generation time: Usually 30 sec ~ 3 min
  • Rate limit: Volcengine has request frequency limits; wait 1-5 min if 50430 error
  • TTS: edge-tts (Microsoft free), no API key needed

Error Handling

Error CodeMeaningSolution
50430Rate limitWait 1-5 min, retry
50207Image decode errorUse jpg/png format
401AK/SK errorCheck credentials

Comments

Loading comments...