Install
openclaw skills install feishu-voice-loopAccept text or voice input, transcribe if needed, generate natural OpenAI TTS speech, and send audio output to Feishu chat or web player.
openclaw skills install feishu-voice-loopProvide a reusable three-step voice loop for OpenClaw:
When the input is voice, transcribe it to text first, then continue through the same output pipeline.
Prerequisites:
OPENAI_API_KEY is set for TTS~/.openclaw/openclaw.json under channels.feishu.appId/appSecret, or are passed explicitlyffmpeg and ffprobe are installed and available~/.openclaw/openclaw.json under tools.media.audio.modelsMain scripts:
scripts/openai_tts_feishu.pyscripts/transcribe_audio.pyUse this when you have a local .ogg, .opus, .wav, or similar file and want text.
python3 scripts/transcribe_audio.py /path/to/input.ogg
This script reuses the existing Whisper CLI configuration from ~/.openclaw/openclaw.json.
Use this when you already have text and want to send a Feishu voice message.
python3 scripts/openai_tts_feishu.py \
--to <feishu_open_id> \
--text "这条是语音测试。" \
--voice alloy \
--model gpt-4o-mini-tts
The script will:
audio/speechffmpegaudio message to the target open_idUse this skill when the goal is a reusable voice interaction pipeline:
Read references/input-output-workflow.md when building or explaining the end-to-end loop.
Default preset is stored in references/presets.md.
Unless the user asks otherwise, use:
gpt-4o-mini-ttsalloyWhen the user asks for a different flavor, either:
--instructionsreferences/presets.mdCommon failure cases:
Missing OPENAI_API_KEY → ask for API key / env setupchannels.feishu.appId/appSecretffmpeg or ffprobe → install locally before retryingtools.media.audio.modelsWhen OpenAI billing is not enabled, say so directly instead of pretending the voice was generated.
Package with:
python3 /Users/zoepeng/.openclaw/lib/node_modules/openclaw/skills/skill-creator/scripts/package_skill.py \
/Users/zoepeng/.openclaw/workspace/skills/openai-feishu-voice
The resulting .skill file can be shared or uploaded wherever the user distributes skills.
Use for deterministic TTS generation and Feishu delivery.
Use for deterministic local audio transcription via the configured Whisper CLI.
Read when the user asks for a different voice direction or wants named presets.
Read when packaging or explaining the complete voice-in / voice-out solution.