Feishu Voice Send

API key required
Other

Feishu native voice message skill (no ffmpeg needed). Supports multi-language TTS/STT (Chinese, English, etc.) via MiniMax and Edge TTS, with Whisper for receiving.

Install

openclaw skills install feishu-voice-send

Feishu Voice Send

Send audio as native Feishu voice messages. Supports multi-language TTS (Chinese, English, etc.) and STT via Whisper.

Features

  • 🎙️ Receive Voice: receive .ogg voice messages, transcribe to text
  • 🔊 Send Voice: prefer MiniMax TTS, auto-fallback to Edge TTS when quota is insufficient
  • Native Format: sent voice appears as voice bubble in Feishu (not a file)
  • 🌍 Multi-language: Chinese, English, etc. via MiniMax and Edge TTS

TTS Engine Selection Logic

Send voice request
    ↓
Check MiniMax speech-hd quota (current_interval_total_count - usage_count)
    ↓
Quota > 0 → MiniMax TTS (speech-2.8-hd) ✅
Quota ≤ 0 → Edge TTS (zh-CN-XiaoxiaoNeural) ✅

Quota check: run mmx quota show --output json and look for speech_generation category remaining count.

Architecture

User voice → .ogg received → Whisper STT → understand → reply content
                                                              ↓
User ← Feishu voice bubble ← Ogg/Opus convert ← MP3 TTS ← text
                                   ↑                ↑
                           PyAV convert      MiniMax / Edge

Implementation

Sending Voice (text → Ogg/Opus)

Main entry: send_feishu_voice_unified.py

import subprocess, av, os, re, sys, json, tempfile

EDGE_TTS_SCRIPT = "/home/node/.openclaw/plugin-skills/edge-tts/scripts/tts-converter.js"

def check_minimax_quota() -> int:
    result = subprocess.run(['mmx', 'quota', 'show', '--output', 'json'], capture_output=True, text=True)
    data = json.loads(result.stdout)
    for cat in data.get('category_remains', []):
        if cat.get('category') == 'speech_generation':
            return cat.get('current_interval_total_count', 0) - cat.get('current_interval_usage_count', 0)
    return 0

def generate_minimax_tts(text: str) -> str:
    tmp = tempfile.mktemp(suffix='.mp3')
    subprocess.run(['mmx', 'speech', 'synthesize', '--text', text, '--out', tmp], check=True)
    return tmp

def generate_edge_tts(text: str) -> str:
    text_clean = re.sub(r'\b(TTS|语音|文字转语音|text-to-speech)\b', '', text, flags=re.IGNORECASE).strip()
    result = subprocess.run(['node', EDGE_TTS_SCRIPT, text_clean, '--voice', 'zh-CN-XiaoxiaoNeural'], capture_output=True, text=True, check=True)
    return re.search(r'Audio saved to: (.+)', result.stdout).group(1).strip()

def send_voice(text: str) -> str:
    quota = check_minimax_quota()
    mp3_path = generate_minimax_tts(text) if quota > 0 else generate_edge_tts(text)
    return convert_to_ogg(mp3_path)

Format Conversion (MP3 → Ogg/Opus)

Use PyAV to convert TTS MP3 to Feishu native format:

  • Container: Ogg
  • Codec: libopus
  • Sample rate: 16000Hz
  • Channels: mono

Dependencies

DependencyPurposeInstall
mmx CLIMiniMax TTSInstalled, API Key in ~/.mmx/config.json
edge-tts (node)Edge TTS fallbackInstalled at /home/node/.openclaw/plugin-skills/edge-tts/
PyAVAudio format conversionpip install av
WhisperSpeech recognitionpip install openai-whisper
soundfileAudio file readingpip install soundfile
openclaw message toolFeishu message sendingBuilt into OpenClaw

Files

FileDescription
send_feishu_voice_unified.pyUnified TTS sender (recommended)
send_feishu_voice.pyLegacy Edge TTS only version

Limitations

  • Does not support ElevenLabs or other cloud TTS (needs API Key)
  • Long audio (>30s) should be segmented
  • Feishu Ogg requirements: Ogg container + Opus codec + 16kHz + mono

Changelog

  • 2026-05-28: Added unified version — MiniMax TTS first, auto-fallback to Edge TTS on quota exhaustion
  • 2026-05-17: Initial version