Feishu Voice Loop
Accept text or voice input, transcribe if needed, generate natural OpenAI TTS speech, and send audio output to Feishu chat or web player.
Like a lobster shell, security has layers — review code before you run it.
License
SKILL.md
Feishu Voice Loop
Provide a reusable three-step voice loop for OpenClaw:
- accept text or voice input
- generate speech with OpenAI TTS
- return the audio to Feishu or a web player
When the input is voice, transcribe it to text first, then continue through the same output pipeline.
Quick start
Prerequisites:
OPENAI_API_KEYis set for TTS- Feishu app credentials exist in
~/.openclaw/openclaw.jsonunderchannels.feishu.appId/appSecret, or are passed explicitly ffmpegandffprobeare installed and available- local audio transcription is configured in
~/.openclaw/openclaw.jsonundertools.media.audio.models
Main scripts:
scripts/openai_tts_feishu.pyscripts/transcribe_audio.py
Tasks
1. Transcribe voice input
Use this when you have a local .ogg, .opus, .wav, or similar file and want text.
python3 scripts/transcribe_audio.py /path/to/input.ogg
This script reuses the existing Whisper CLI configuration from ~/.openclaw/openclaw.json.
2. Generate and send voice output
Use this when you already have text and want to send a Feishu voice message.
python3 scripts/openai_tts_feishu.py \
--to <feishu_open_id> \
--text "这条是语音测试。" \
--voice alloy \
--model gpt-4o-mini-tts
The script will:
- call OpenAI
audio/speech - save WAV audio temporarily
- convert to Feishu-friendly Opus via
ffmpeg - upload the file to Feishu
- send an
audiomessage to the targetopen_id
3. Run the full voice loop
Use this skill when the goal is a reusable voice interaction pipeline:
- transcribe input audio to text
- decide or generate the reply text
- synthesize reply audio with OpenAI TTS
- send the reply back to Feishu
Read references/input-output-workflow.md when building or explaining the end-to-end loop.
Default output style
Default preset is stored in references/presets.md.
Unless the user asks otherwise, use:
- model:
gpt-4o-mini-tts - voice:
alloy - default style: 年轻日系男声感、温柔里带一点撩、贴耳边私聊感、自然、不播音腔
When the user asks for a different flavor, either:
- pass a custom
--instructions - or adapt one of the presets in
references/presets.md
Handle failures
Common failure cases:
Missing OPENAI_API_KEY→ ask for API key / env setup- HTTP 429 from OpenAI → billing or quota issue
- missing Feishu app credentials → configure
channels.feishu.appId/appSecret - missing
ffmpegorffprobe→ install locally before retrying - missing transcription model config → configure
tools.media.audio.models
When OpenAI billing is not enabled, say so directly instead of pretending the voice was generated.
Packaging and sharing
Package with:
python3 /Users/zoepeng/.openclaw/lib/node_modules/openclaw/skills/skill-creator/scripts/package_skill.py \
/Users/zoepeng/.openclaw/workspace/skills/openai-feishu-voice
The resulting .skill file can be shared or uploaded wherever the user distributes skills.
Resources
scripts/openai_tts_feishu.py
Use for deterministic TTS generation and Feishu delivery.
scripts/transcribe_audio.py
Use for deterministic local audio transcription via the configured Whisper CLI.
references/presets.md
Read when the user asks for a different voice direction or wants named presets.
references/input-output-workflow.md
Read when packaging or explaining the complete voice-in / voice-out solution.
Files
5 totalComments
Loading comments…
