Feishu Voice (ElevenLabs)

Send and receive voice messages on Feishu (Lark) using ElevenLabs TTS and STT. Activate when user asks to send a voice message on Feishu, or when receiving a Feishu audio message (media attachment with .ogg/.opus file) that needs transcription. Supports smart reply mode — auto voice-reply to voice messages, text-reply to text messages.

DongDong@dongdongbear

Install

openclaw skills install @dongdongbear/feishu-voice-elevenlabs

Feishu Voice (TTS + STT)

Send voice messages and transcribe received voice messages on Feishu using ElevenLabs.

Prerequisites

sag CLI (ElevenLabs TTS): npm i -g sag or go install
ffmpeg / ffprobe: brew install ffmpeg
ElevenLabs paid plan (required for library voices)
Feishu app with im:message:send_as_bot and im:file permissions

Environment Variables

Variable	Required	Description
`ELEVENLABS_API_KEY`	✅	ElevenLabs API key
`FEISHU_APP_ID`	✅ (TTS)	Feishu app ID
`FEISHU_APP_SECRET`	✅ (TTS)	Feishu app secret
`ELEVENLABS_VOICE_ID`	✅ (TTS)	Voice ID (browse at elevenlabs.io/voice-library)
`ELEVENLABS_MODEL_ID`	✅ (TTS)	Model ID (e.g. `eleven_multilingual_v2`, `eleven_v3`)
`ELEVENLABS_SPEED`	❌	Speech speed 0.5-2.0 (default: 1.0)

If FEISHU_APP_ID / FEISHU_APP_SECRET are not in env, extract from openclaw config:

bash

export FEISHU_APP_ID=$(python3 -c "import json; print(json.load(open('$HOME/.openclaw/openclaw.json'))['channels']['feishu']['appId'])")
export FEISHU_APP_SECRET=$(python3 -c "import json; print(json.load(open('$HOME/.openclaw/openclaw.json'))['channels']['feishu']['appSecret'])")

Voice Selection

See config/voice-config.example.json for a curated voice list. Browse all voices at https://elevenlabs.io/voice-library or run sag voices.

Recommended models:

eleven_multilingual_v2 — best for Chinese and multilingual content
eleven_v3 — latest English-optimized model

Sending Voice Messages (TTS)

bash

scripts/feishu-voice-send.sh <text> <receive_id> [receive_id_type] [speed]

receive_id: target user open_id or chat_id
receive_id_type: open_id (default) or chat_id
speed: speech speed multiplier, 0.5-2.0 (default: 1.0)

Receiving Voice Messages (STT)

When OpenClaw delivers a Feishu voice message, it arrives as a media attachment (.ogg file). Transcribe with:

bash

scripts/feishu-voice-stt.sh /path/to/audio.ogg

Returns recognized text to stdout. Uses ElevenLabs scribe_v1 model with automatic language detection.

Fallback: Download via Feishu API

If the audio file is not delivered as an attachment (only file_key available):

List recent messages: GET /im/v1/messages?container_id_type=chat&container_id=CHAT_ID&page_size=5&sort_type=ByCreateTimeDesc
Download audio: GET /im/v1/messages/{message_id}/resources/{file_key}?type=file
Run STT script on the downloaded file

Smart Reply Mode

When receiving messages, follow this pattern for natural conversation:

Voice message received → transcribe with STT → understand → reply with voice (TTS)
Text message received → understand → reply with text
Override: user can request voice/text reply explicitly

Important Notes

Feishu msg_type must be "audio" — not "media" or "file"
OpenClaw's message tool asVoice does not work correctly for Feishu — use this script instead
STT uses ElevenLabs scribe_v1 model, supports Chinese, English, and 90+ languages
For free ElevenLabs accounts, only premade voices work; library voices require a paid plan

Feishu Voice (ElevenLabs)

Install

Feishu Voice (TTS + STT)

Prerequisites

Environment Variables

Voice Selection

Sending Voice Messages (TTS)

Receiving Voice Messages (STT)

Fallback: Download via Feishu API

Smart Reply Mode

Important Notes

Related skills