Video Messages from your openclaw
v0.1.2Generate and send video messages with a lip-syncing VRM avatar. Use when user asks for video message, avatar video, video reply, or when TTS should be delivered as video instead of audio.
Security Scan
OpenClaw
Benign
medium confidencePurpose & Capability
Name/description (avatar video messages) match what the skill asks for: an 'avatarcam' binary to render VRM avatars and ffmpeg to post-process video. The declared npm/brew/apt installers align with providing those binaries.
Instruction Scope
Runtime instructions are focused on the task (generate TTS, run avatarcam, post via message). They reference reading TOOLS.md for configuration and checking $DISPLAY to decide on xvfb; the registry metadata did not declare TOOLS.md as a required config path or list environment checks, which is a minor mismatch but not a functional red flag. Instructions will run local binaries and send the generated video via the agent's message tool (expected for this capability).
Install Mechanism
Install spec uses a third-party npm package (@thewulf7/openclaw-avatarcam) to provide avatarcam and uses standard brew/apt packages for ffmpeg and xvfb. This is proportionate to the functionality but carries normal supply-chain risk because an unreviewed global npm package executes code on install and provides the avatarcam binary.
Credentials
The skill does not request secrets or credentials and only needs local binaries and (optionally) access to TOOLS.md and temporary files. It references $DISPLAY and Docker env var names in docs but does not require sensitive environment variables — access is proportionate to the task.
Persistence & Privilege
Skill does not request always:true or other elevated persistent privileges. Its install steps create a global npm binary if installed, which is normal for a tool; it does not modify other skills or system-wide OpenClaw configs beyond installing its own binary.
Assessment
This skill appears to do what it says: render lip-synced avatar videos and send them via the agent. Before installing, consider: 1) Inspect the npm package (@thewulf7/openclaw-avatarcam) source or run it in a sandbox because global npm installs can run arbitrary code. 2) Ensure TOOLS.md (the skill's config) is present and does not contain sensitive data you wouldn't want read by the skill. 3) Be aware generated videos are sent using the agent's messaging tool—verify that sending such media is appropriate for your privacy requirements. 4) Prefer installing in a controlled environment (container, VM) rather than directly on a sensitive host. If you want higher assurance, ask the skill author for a vetted release URL or review the npm package contents before global installation.Like a lobster shell, security has layers — review code before you run it.
Runtime requirements
🎥 Clawdis
Binsffmpeg, avatarcam
Install
Install ffmpeg (brew)
Bins: ffmpeg
brew install ffmpeglatest
Video Message
Generate avatar video messages from text or audio. Outputs as Telegram video notes (circular format).
Installation
npm install -g openclaw-avatarcam
Configuration
Configure in TOOLS.md:
### Video Message (avatarcam)
- avatar: default.vrm
- background: #00FF00
Settings Reference
| Setting | Default | Description |
|---|---|---|
avatar | default.vrm | VRM avatar file path |
background | #00FF00 | Color (hex) or image path |
Prerequisites
System Dependencies
| Platform | Command |
|---|---|
| macOS | brew install ffmpeg |
| Linux | sudo apt-get install -y xvfb xauth ffmpeg |
| Windows | Install ffmpeg and add to PATH |
| Docker | See Docker section below |
Note: macOS and Windows don't need xvfb — they have native display support.
Docker Users
Add to OPENCLAW_DOCKER_APT_PACKAGES:
build-essential procps curl file git ca-certificates xvfb xauth libgbm1 libxss1 libatk1.0-0 libatk-bridge2.0-0 libgdk-pixbuf2.0-0 libgtk-3-0 libasound2 libnss3 ffmpeg
Usage
# With color background
avatarcam --audio voice.mp3 --output video.mp4 --background "#00FF00"
# With image background
avatarcam --audio voice.mp3 --output video.mp4 --background "./bg.png"
# With custom avatar
avatarcam --audio voice.mp3 --output video.mp4 --avatar "./custom.vrm"
Sending as Video Note
Use OpenClaw's message tool with asVideoNote:
message action=send filePath=/tmp/video.mp4 asVideoNote=true
Workflow
- Read config from TOOLS.md (avatar, background)
- Generate TTS if given text:
tts text="..."→ audio path - Run avatarcam with audio + settings → MP4 output
- Send as video note via
message action=send filePath=... asVideoNote=true - Return NO_REPLY after sending
Example Flow
User: "Send me a video message saying hello"
# 1. TTS
tts text="Hello! How are you today?" → /tmp/voice.mp3
# 2. Generate video
avatarcam --audio /tmp/voice.mp3 --output /tmp/video.mp4 --background "#00FF00"
# 3. Send as video note
message action=send filePath=/tmp/video.mp4 asVideoNote=true
# 4. Reply
NO_REPLY
Technical Details
| Setting | Value |
|---|---|
| Resolution | 384x384 (square) |
| Frame rate | 30fps constant |
| Max duration | 60 seconds |
| Video codec | H.264 (libx264) |
| Audio codec | AAC |
| Quality | CRF 18 (high quality) |
| Container | MP4 |
Processing Pipeline
- Electron renders VRM avatar with lip sync at 1280x720
- WebM captured via
canvas.captureStream(30) - FFmpeg processes: crop → fps normalize → scale → encode
- Message tool sends via Telegram
sendVideoNoteAPI
Platform Support
| Platform | Display | Notes |
|---|---|---|
| macOS | Native Quartz | No extra deps |
| Linux | xvfb (headless) | apt install xvfb |
| Windows | Native | No extra deps |
Headless Rendering
Avatarcam auto-detects headless environments:
- Uses
xvfb-runwhen$DISPLAYis not set (Linux only) - macOS/Windows use native display
- GPU stall warnings are safe to ignore
- Generation time: ~1.5x realtime (20s audio ≈ 30s processing)
Notes
- Config is read from TOOLS.md
- Clean up temp files after sending:
rm /tmp/video*.mp4 - For regular video (not circular), omit
asVideoNote=true
Comments
Loading comments...
