Feishu Voice Loop

v1.0.0

Accept text or voice input, transcribe if needed, generate natural OpenAI TTS speech, and send audio output to Feishu chat or web player.

0· 392· 1 versions· 0 current· 0 all-time· Updated 8h ago· MIT-0

byZoePeng@pengzhuowen

Security Scans

VirusTotalBenign ClawScanSuspicious Static analysisBenign

Install

openclaw skills install feishu-voice-loop

Feishu Voice Loop

Provide a reusable three-step voice loop for OpenClaw:

accept text or voice input
generate speech with OpenAI TTS
return the audio to Feishu or a web player

When the input is voice, transcribe it to text first, then continue through the same output pipeline.

Quick start

Prerequisites:

OPENAI_API_KEY is set for TTS
Feishu app credentials exist in ~/.openclaw/openclaw.json under channels.feishu.appId/appSecret, or are passed explicitly
ffmpeg and ffprobe are installed and available
local audio transcription is configured in ~/.openclaw/openclaw.json under tools.media.audio.models

Main scripts:

scripts/openai_tts_feishu.py
scripts/transcribe_audio.py

Tasks

1. Transcribe voice input

Use this when you have a local .ogg, .opus, .wav, or similar file and want text.

python3 scripts/transcribe_audio.py /path/to/input.ogg

This script reuses the existing Whisper CLI configuration from ~/.openclaw/openclaw.json.

2. Generate and send voice output

Use this when you already have text and want to send a Feishu voice message.

python3 scripts/openai_tts_feishu.py \
  --to <feishu_open_id> \
  --text "这条是语音测试。" \
  --voice alloy \
  --model gpt-4o-mini-tts

The script will:

call OpenAI audio/speech
save WAV audio temporarily
convert to Feishu-friendly Opus via ffmpeg
upload the file to Feishu
send an audio message to the target open_id

3. Run the full voice loop

Use this skill when the goal is a reusable voice interaction pipeline:

transcribe input audio to text
decide or generate the reply text
synthesize reply audio with OpenAI TTS
send the reply back to Feishu

Read references/input-output-workflow.md when building or explaining the end-to-end loop.

Default output style

Default preset is stored in references/presets.md.

Unless the user asks otherwise, use:

model: gpt-4o-mini-tts
voice: alloy
default style: 年轻日系男声感、温柔里带一点撩、贴耳边私聊感、自然、不播音腔

When the user asks for a different flavor, either:

pass a custom --instructions
or adapt one of the presets in references/presets.md

Handle failures

Common failure cases:

Missing OPENAI_API_KEY → ask for API key / env setup
HTTP 429 from OpenAI → billing or quota issue
missing Feishu app credentials → configure channels.feishu.appId/appSecret
missing ffmpeg or ffprobe → install locally before retrying
missing transcription model config → configure tools.media.audio.models

When OpenAI billing is not enabled, say so directly instead of pretending the voice was generated.

Packaging and sharing

Package with:

python3 /Users/zoepeng/.openclaw/lib/node_modules/openclaw/skills/skill-creator/scripts/package_skill.py \
  /Users/zoepeng/.openclaw/workspace/skills/openai-feishu-voice

The resulting .skill file can be shared or uploaded wherever the user distributes skills.

Resources

scripts/openai_tts_feishu.py

Use for deterministic TTS generation and Feishu delivery.

scripts/transcribe_audio.py

Use for deterministic local audio transcription via the configured Whisper CLI.

references/presets.md

Read when the user asks for a different voice direction or wants named presets.

references/input-output-workflow.md

Read when packaging or explaining the complete voice-in / voice-out solution.

Version tags

audiovk97fn11rddg48dga8z0jmcqdn182prdbfeishuvk97fn11rddg48dga8z0jmcqdn182prdblatestvk97fn11rddg48dga8z0jmcqdn182prdbopenaivk97fn11rddg48dga8z0jmcqdn182prdbsttvk97fn11rddg48dga8z0jmcqdn182prdbttsvk97fn11rddg48dga8z0jmcqdn182prdbvoicevk97fn11rddg48dga8z0jmcqdn182prdb