Voice Message

Send voice messages across chat channels (Telegram, Discord, Feishu/Lark, Signal, WhatsApp, and others) using edge-tts for text-to-speech and ffmpeg for audi...

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 1 · 498 · 3 current installs · 4 all-time installs

by@xmanrui

MIT-0

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

Name/description (send voice messages via edge-tts + ffmpeg to multiple chat platforms) matches the included scripts and SKILL.md: gen_voice.sh creates OGG/OPUS using edge-tts and ffmpeg, gen_waveform.py computes waveform/duration for Discord, and send_feishu_voice.sh uploads and sends audio via Feishu API. The required tools (edge-tts, ffmpeg/ffprobe, curl, python3) are appropriate and proportionate to the stated purpose.

ℹ

Instruction Scope

Runtime instructions stay within purpose: they call local conversion tools and platform APIs. Two operational/privacy notes: (1) edge-tts will send text audio requests to an external TTS service (expected but relevant for privacy of message contents); (2) the Feishu tenant_access_token is passed as a CLI argument in send_feishu_voice.sh, which can expose it via process listings or shell history—SKILL.md does not warn about this. The scripts do not read unrelated files or environment variables.

✓

Install Mechanism

This is instruction-only with bundled scripts and no install spec — no downloads or archives are performed by the skill itself. That lowers install-time risk; required third-party tools are standard (edge-tts, ffmpeg).

ℹ

Credentials

The skill declares no required environment variables or credentials and instead expects tokens/IDs to be provided at runtime (e.g., tenant_access_token argument for Feishu). That is proportionate, but passing secrets on the command line is risky (process-list exposure and shell history). Users should avoid supplying long-lived secrets as plain CLI args and prefer ephemeral tokens or safer injection mechanisms (stdin/env with proper protection).

✓

Persistence & Privilege

The skill does not request persistent/system-wide privileges, does not set always:true, and does not modify other skills or global agent settings. It runs as-needed and requires explicit invocation.

Assessment

This skill appears to do what it says, but consider these operational cautions before installing: (1) The scripts call external services — edge-tts will send the text you convert to a remote TTS service, and send_feishu_voice.sh calls Feishu APIs — so message contents and tokens travel over the network. (2) Avoid passing long-lived tokens as plain command-line arguments (they can be visible via ps and may be stored in shell history); prefer ephemeral tokens or supplying tokens via a protected environment variable or stdin if you adapt the scripts. (3) Ensure you trust the source (no homepage provided) before running bundled shell scripts; inspect and, if needed, run them in a restricted environment. (4) Confirm required tools (edge-tts, ffmpeg/ffprobe, curl, python3) are installed from official sources. If you want higher assurance, request the skill author to accept tokens via stdin/env and to document any data retention or telemetry from the TTS provider.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.4

Download zip

latestvk972bt0kk76w8a4daev4egk92h81ysbb

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

Runtime requirements

🎤 Clawdis

SKILL.md

Voice Message

Send text as voice messages to any chat channel.

Prerequisites

edge-tts — Microsoft Edge TTS (pip install edge-tts)
ffmpeg / ffprobe — audio conversion and duration detection

Default Voices

Chinese: zh-CN-XiaoxiaoNeural
English: en-US-JennyNeural
Other languages: see references/voices.md

Step 1: Generate Voice File

Use scripts/gen_voice.sh to convert text to an ogg/opus file:

scripts/gen_voice.sh "你好" /tmp/voice.ogg
scripts/gen_voice.sh "Hello" /tmp/voice.ogg en-US-JennyNeural

Arguments: <text> <output.ogg> [voice]

If voice is omitted, defaults to zh-CN-XiaoxiaoNeural.

Step 2: Send by Channel

Generic (Telegram, Signal, WhatsApp, etc.)

Use the message tool directly:

action=send, asVoice=true, filePath=/tmp/voice.ogg

This works for most channels. Telegram confirmed working.

Feishu/Lark

⚠️ Feishu does NOT support asVoice=true via the message tool. You must use the dedicated script.

Use scripts/send_feishu_voice.sh:

scripts/send_feishu_voice.sh /tmp/voice.ogg <receive_id> <tenant_access_token> [receive_id_type]

receive_id_type: open_id (default), chat_id, user_id, union_id, email
The script handles upload (as opus with duration) and sends as audio message type to produce a voice bubble.
To get tenant_access_token, use the Feishu tenant token API with your app credentials.

Discord

Discord voice messages require a waveform and special flags.

Generate ogg with scripts/gen_voice.sh
Generate waveform: python3 scripts/gen_waveform.py /tmp/voice.ogg
- Outputs JSON: {"duration_secs": 4.2, "waveform": "base64..."}
Send via Discord API with flags: 8192 (IS_VOICE_MESSAGE) and the waveform/duration in attachments metadata.
- Missing waveform/duration causes error 50161.

Fallback

If asVoice=true does not produce a voice bubble on a channel:

Try sending via the platform's native API
If native API unavailable, send as audio file attachment

Files

5 total

Select a file

Select a file to preview.

Comments

Loading comments…