Voice Message
Send voice messages across chat channels (Telegram, Discord, Feishu/Lark, Signal, WhatsApp, and others) using edge-tts for text-to-speech and ffmpeg for audi...
MIT-0 · Free to use, modify, and redistribute. No attribution required.
⭐ 1 · 498 · 3 current installs · 4 all-time installs
by@xmanrui
MIT-0
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name/description (send voice messages via edge-tts + ffmpeg to multiple chat platforms) matches the included scripts and SKILL.md: gen_voice.sh creates OGG/OPUS using edge-tts and ffmpeg, gen_waveform.py computes waveform/duration for Discord, and send_feishu_voice.sh uploads and sends audio via Feishu API. The required tools (edge-tts, ffmpeg/ffprobe, curl, python3) are appropriate and proportionate to the stated purpose.
Instruction Scope
Runtime instructions stay within purpose: they call local conversion tools and platform APIs. Two operational/privacy notes: (1) edge-tts will send text audio requests to an external TTS service (expected but relevant for privacy of message contents); (2) the Feishu tenant_access_token is passed as a CLI argument in send_feishu_voice.sh, which can expose it via process listings or shell history—SKILL.md does not warn about this. The scripts do not read unrelated files or environment variables.
Install Mechanism
This is instruction-only with bundled scripts and no install spec — no downloads or archives are performed by the skill itself. That lowers install-time risk; required third-party tools are standard (edge-tts, ffmpeg).
Credentials
The skill declares no required environment variables or credentials and instead expects tokens/IDs to be provided at runtime (e.g., tenant_access_token argument for Feishu). That is proportionate, but passing secrets on the command line is risky (process-list exposure and shell history). Users should avoid supplying long-lived secrets as plain CLI args and prefer ephemeral tokens or safer injection mechanisms (stdin/env with proper protection).
Persistence & Privilege
The skill does not request persistent/system-wide privileges, does not set always:true, and does not modify other skills or global agent settings. It runs as-needed and requires explicit invocation.
Assessment
This skill appears to do what it says, but consider these operational cautions before installing: (1) The scripts call external services — edge-tts will send the text you convert to a remote TTS service, and send_feishu_voice.sh calls Feishu APIs — so message contents and tokens travel over the network. (2) Avoid passing long-lived tokens as plain command-line arguments (they can be visible via ps and may be stored in shell history); prefer ephemeral tokens or supplying tokens via a protected environment variable or stdin if you adapt the scripts. (3) Ensure you trust the source (no homepage provided) before running bundled shell scripts; inspect and, if needed, run them in a restricted environment. (4) Confirm required tools (edge-tts, ffmpeg/ffprobe, curl, python3) are installed from official sources. If you want higher assurance, request the skill author to accept tokens via stdin/env and to document any data retention or telemetry from the TTS provider.Like a lobster shell, security has layers — review code before you run it.
Current versionv1.0.4
Download ziplatest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
Runtime requirements
🎤 Clawdis
SKILL.md
Voice Message
Send text as voice messages to any chat channel.
Prerequisites
edge-tts— Microsoft Edge TTS (pip install edge-tts)ffmpeg/ffprobe— audio conversion and duration detection
Default Voices
- Chinese:
zh-CN-XiaoxiaoNeural - English:
en-US-JennyNeural - Other languages: see references/voices.md
Step 1: Generate Voice File
Use scripts/gen_voice.sh to convert text to an ogg/opus file:
scripts/gen_voice.sh "你好" /tmp/voice.ogg
scripts/gen_voice.sh "Hello" /tmp/voice.ogg en-US-JennyNeural
Arguments: <text> <output.ogg> [voice]
- If voice is omitted, defaults to
zh-CN-XiaoxiaoNeural.
Step 2: Send by Channel
Generic (Telegram, Signal, WhatsApp, etc.)
Use the message tool directly:
action=send, asVoice=true, filePath=/tmp/voice.ogg
This works for most channels. Telegram confirmed working.
Feishu/Lark
⚠️ Feishu does NOT support asVoice=true via the message tool. You must use the dedicated script.
Use scripts/send_feishu_voice.sh:
scripts/send_feishu_voice.sh /tmp/voice.ogg <receive_id> <tenant_access_token> [receive_id_type]
receive_id_type:open_id(default),chat_id,user_id,union_id,email- The script handles upload (as opus with duration) and sends as audio message type to produce a voice bubble.
- To get
tenant_access_token, use the Feishu tenant token API with your app credentials.
Discord
Discord voice messages require a waveform and special flags.
- Generate ogg with
scripts/gen_voice.sh - Generate waveform:
python3 scripts/gen_waveform.py /tmp/voice.ogg- Outputs JSON:
{"duration_secs": 4.2, "waveform": "base64..."}
- Outputs JSON:
- Send via Discord API with
flags: 8192(IS_VOICE_MESSAGE) and the waveform/duration in attachments metadata.- Missing waveform/duration causes error 50161.
Fallback
If asVoice=true does not produce a voice bubble on a channel:
- Try sending via the platform's native API
- If native API unavailable, send as audio file attachment
Files
5 totalSelect a file
Select a file to preview.
Comments
Loading comments…
