Video Messages from your openclaw

v0.1.2

Generate and send video messages with a lip-syncing VRM avatar. Use when user asks for video message, avatar video, video reply, or when TTS should be delivered as video instead of audio.

4· 3k·8 current·8 all-time
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
Name/description (avatar video messages) match what the skill asks for: an 'avatarcam' binary to render VRM avatars and ffmpeg to post-process video. The declared npm/brew/apt installers align with providing those binaries.
Instruction Scope
Runtime instructions are focused on the task (generate TTS, run avatarcam, post via message). They reference reading TOOLS.md for configuration and checking $DISPLAY to decide on xvfb; the registry metadata did not declare TOOLS.md as a required config path or list environment checks, which is a minor mismatch but not a functional red flag. Instructions will run local binaries and send the generated video via the agent's message tool (expected for this capability).
Install Mechanism
Install spec uses a third-party npm package (@thewulf7/openclaw-avatarcam) to provide avatarcam and uses standard brew/apt packages for ffmpeg and xvfb. This is proportionate to the functionality but carries normal supply-chain risk because an unreviewed global npm package executes code on install and provides the avatarcam binary.
Credentials
The skill does not request secrets or credentials and only needs local binaries and (optionally) access to TOOLS.md and temporary files. It references $DISPLAY and Docker env var names in docs but does not require sensitive environment variables — access is proportionate to the task.
Persistence & Privilege
Skill does not request always:true or other elevated persistent privileges. Its install steps create a global npm binary if installed, which is normal for a tool; it does not modify other skills or system-wide OpenClaw configs beyond installing its own binary.
Assessment
This skill appears to do what it says: render lip-synced avatar videos and send them via the agent. Before installing, consider: 1) Inspect the npm package (@thewulf7/openclaw-avatarcam) source or run it in a sandbox because global npm installs can run arbitrary code. 2) Ensure TOOLS.md (the skill's config) is present and does not contain sensitive data you wouldn't want read by the skill. 3) Be aware generated videos are sent using the agent's messaging tool—verify that sending such media is appropriate for your privacy requirements. 4) Prefer installing in a controlled environment (container, VM) rather than directly on a sensitive host. If you want higher assurance, ask the skill author for a vetted release URL or review the npm package contents before global installation.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🎥 Clawdis
Binsffmpeg, avatarcam

Install

Install ffmpeg (brew)
Bins: ffmpeg
brew install ffmpeg
latestvk97cbe73mx9h12v4ck5pgxq6v580es5t
3kdownloads
4stars
3versions
Updated 1mo ago
v0.1.2
MIT-0

Video Message

Generate avatar video messages from text or audio. Outputs as Telegram video notes (circular format).

Installation

npm install -g openclaw-avatarcam

Configuration

Configure in TOOLS.md:

### Video Message (avatarcam)
- avatar: default.vrm
- background: #00FF00

Settings Reference

SettingDefaultDescription
avatardefault.vrmVRM avatar file path
background#00FF00Color (hex) or image path

Prerequisites

System Dependencies

PlatformCommand
macOSbrew install ffmpeg
Linuxsudo apt-get install -y xvfb xauth ffmpeg
WindowsInstall ffmpeg and add to PATH
DockerSee Docker section below

Note: macOS and Windows don't need xvfb — they have native display support.

Docker Users

Add to OPENCLAW_DOCKER_APT_PACKAGES:

build-essential procps curl file git ca-certificates xvfb xauth libgbm1 libxss1 libatk1.0-0 libatk-bridge2.0-0 libgdk-pixbuf2.0-0 libgtk-3-0 libasound2 libnss3 ffmpeg

Usage

# With color background
avatarcam --audio voice.mp3 --output video.mp4 --background "#00FF00"

# With image background
avatarcam --audio voice.mp3 --output video.mp4 --background "./bg.png"

# With custom avatar
avatarcam --audio voice.mp3 --output video.mp4 --avatar "./custom.vrm"

Sending as Video Note

Use OpenClaw's message tool with asVideoNote:

message action=send filePath=/tmp/video.mp4 asVideoNote=true

Workflow

  1. Read config from TOOLS.md (avatar, background)
  2. Generate TTS if given text: tts text="..." → audio path
  3. Run avatarcam with audio + settings → MP4 output
  4. Send as video note via message action=send filePath=... asVideoNote=true
  5. Return NO_REPLY after sending

Example Flow

User: "Send me a video message saying hello"

# 1. TTS
tts text="Hello! How are you today?" → /tmp/voice.mp3

# 2. Generate video
avatarcam --audio /tmp/voice.mp3 --output /tmp/video.mp4 --background "#00FF00"

# 3. Send as video note
message action=send filePath=/tmp/video.mp4 asVideoNote=true

# 4. Reply
NO_REPLY

Technical Details

SettingValue
Resolution384x384 (square)
Frame rate30fps constant
Max duration60 seconds
Video codecH.264 (libx264)
Audio codecAAC
QualityCRF 18 (high quality)
ContainerMP4

Processing Pipeline

  1. Electron renders VRM avatar with lip sync at 1280x720
  2. WebM captured via canvas.captureStream(30)
  3. FFmpeg processes: crop → fps normalize → scale → encode
  4. Message tool sends via Telegram sendVideoNote API

Platform Support

PlatformDisplayNotes
macOSNative QuartzNo extra deps
Linuxxvfb (headless)apt install xvfb
WindowsNativeNo extra deps

Headless Rendering

Avatarcam auto-detects headless environments:

  • Uses xvfb-run when $DISPLAY is not set (Linux only)
  • macOS/Windows use native display
  • GPU stall warnings are safe to ignore
  • Generation time: ~1.5x realtime (20s audio ≈ 30s processing)

Notes

  • Config is read from TOOLS.md
  • Clean up temp files after sending: rm /tmp/video*.mp4
  • For regular video (not circular), omit asVideoNote=true

Comments

Loading comments...