Feishu Voice Loop

Accept text or voice input, transcribe if needed, generate natural OpenAI TTS speech, and send audio output to Feishu chat or web player.

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 0 · 157 · 0 current installs · 0 all-time installs

byZoePeng@pengzhuowen

MIT-0

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The code and SKILL.md align with the described purpose (transcribe local audio, call OpenAI TTS, and post audio to Feishu). However the skill package metadata claims no required env vars or config paths, while the instructions and code actually require OPENAI_API_KEY and Feishu credentials stored in ~/.openclaw/openclaw.json (or passed in). That mismatch is unexpected and reduces trust.

Instruction Scope

The runtime instructions and both scripts read the user's ~/.openclaw/openclaw.json and will execute a CLI command taken from that config for transcription. Executing commands sourced from a user-controlled config is functionally reasonable for a pluggable transcription CLI, but it grants the skill the ability to run arbitrary local commands (whatever is configured). The scripts also transmit data to external endpoints (api.openai.com and open.feishu.cn) — which is intended, but users must understand audio/transcripts and API keys will be sent externally. Additionally, the presets.md contains instructions to produce flirtatious/sexualized "teenage boy" voices, which raises safety and policy concerns and is outside ordinary benign assistant use.

ℹ

Install Mechanism

There is no install spec (instruction-only), so nothing is written during installation — low install risk. One code issue: openai_tts_feishu.py invokes ffmpeg using a hardcoded path (/opt/homebrew/bin/ffmpeg) while calling ffprobe as 'ffprobe'. This may cause failures on non-macOS/Homebrew systems and is brittle; it may also indicate the author tested only a specific environment.

Credentials

The skill needs an OPENAI_API_KEY and Feishu appId/appSecret (via ~/.openclaw/openclaw.json or CLI args) and requires ffmpeg/ffprobe — all are proportionate to the stated functionality. However the registry metadata declared no required env vars or config paths, which is inconsistent and misleading. Also note: transcription runs whatever command is configured under tools.media.audio.models[0] — that config may itself contain shell commands or point to other tools, so validate that config before use.

✓

Persistence & Privilege

The skill does not request persistent/always-on presence and does not modify other skills' configuration. It runs only when invoked; normal autonomous invocation is allowed by default but not set to always:true.

What to consider before installing

Before installing or running this skill: - Expect to provide OPENAI_API_KEY and Feishu app credentials; the code reads ~/.openclaw/openclaw.json if you don't pass --app-id/--app-secret. The package metadata did not declare these requirements — double-check before trusting the skill. - Inspect your ~/.openclaw/openclaw.json: the transcription script will execute the command listed under tools.media.audio.models[0]. If that entry points to an unexpected binary or shell command, it could run arbitrary local code. Only use the skill if that config is safe. - The TTS script hardcodes /opt/homebrew/bin/ffmpeg (may fail on other OSes); consider editing the script to use a generic 'ffmpeg' on PATH or your correct ffmpeg location. - Running the skill will send audio/text data to api.openai.com and open.feishu.cn (OpenAI and Feishu). Don’t use it with sensitive data unless you accept that transmission. - The included voice presets explicitly instruct the model to simulate flirtatious/teenage voices (e.g., "teenage boy" and private-sibling scenarios). This is potentially abusive/unsafe. Remove or edit presets that are inappropriate before use. - If you decide to proceed: run the scripts in a controlled environment first, verify network destinations, and confirm the config entries and commands they will run. If you need a cleaner setup, add required env/config declarations to the skill metadata and replace hardcoded ffmpeg path.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0

Download zip

audiovk97fn11rddg48dga8z0jmcqdn182prdbfeishuvk97fn11rddg48dga8z0jmcqdn182prdblatestvk97fn11rddg48dga8z0jmcqdn182prdbopenaivk97fn11rddg48dga8z0jmcqdn182prdbsttvk97fn11rddg48dga8z0jmcqdn182prdbttsvk97fn11rddg48dga8z0jmcqdn182prdbvoicevk97fn11rddg48dga8z0jmcqdn182prdb

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

SKILL.md

Feishu Voice Loop

Provide a reusable three-step voice loop for OpenClaw:

accept text or voice input
generate speech with OpenAI TTS
return the audio to Feishu or a web player

When the input is voice, transcribe it to text first, then continue through the same output pipeline.

Quick start

Prerequisites:

OPENAI_API_KEY is set for TTS
Feishu app credentials exist in ~/.openclaw/openclaw.json under channels.feishu.appId/appSecret, or are passed explicitly
ffmpeg and ffprobe are installed and available
local audio transcription is configured in ~/.openclaw/openclaw.json under tools.media.audio.models

Main scripts:

scripts/openai_tts_feishu.py
scripts/transcribe_audio.py

Tasks

1. Transcribe voice input

Use this when you have a local .ogg, .opus, .wav, or similar file and want text.

python3 scripts/transcribe_audio.py /path/to/input.ogg

This script reuses the existing Whisper CLI configuration from ~/.openclaw/openclaw.json.

2. Generate and send voice output

Use this when you already have text and want to send a Feishu voice message.

python3 scripts/openai_tts_feishu.py \
  --to <feishu_open_id> \
  --text "这条是语音测试。" \
  --voice alloy \
  --model gpt-4o-mini-tts

The script will:

call OpenAI audio/speech
save WAV audio temporarily
convert to Feishu-friendly Opus via ffmpeg
upload the file to Feishu
send an audio message to the target open_id

3. Run the full voice loop

Use this skill when the goal is a reusable voice interaction pipeline:

transcribe input audio to text
decide or generate the reply text
synthesize reply audio with OpenAI TTS
send the reply back to Feishu

Read references/input-output-workflow.md when building or explaining the end-to-end loop.

Default output style

Default preset is stored in references/presets.md.

Unless the user asks otherwise, use:

model: gpt-4o-mini-tts
voice: alloy
default style: 年轻日系男声感、温柔里带一点撩、贴耳边私聊感、自然、不播音腔

When the user asks for a different flavor, either:

pass a custom --instructions
or adapt one of the presets in references/presets.md

Handle failures

Common failure cases:

Missing OPENAI_API_KEY → ask for API key / env setup
HTTP 429 from OpenAI → billing or quota issue
missing Feishu app credentials → configure channels.feishu.appId/appSecret
missing ffmpeg or ffprobe → install locally before retrying
missing transcription model config → configure tools.media.audio.models

When OpenAI billing is not enabled, say so directly instead of pretending the voice was generated.

Packaging and sharing

Package with:

python3 /Users/zoepeng/.openclaw/lib/node_modules/openclaw/skills/skill-creator/scripts/package_skill.py \
  /Users/zoepeng/.openclaw/workspace/skills/openai-feishu-voice

The resulting .skill file can be shared or uploaded wherever the user distributes skills.

Resources

scripts/openai_tts_feishu.py

Use for deterministic TTS generation and Feishu delivery.

scripts/transcribe_audio.py

Use for deterministic local audio transcription via the configured Whisper CLI.

references/presets.md

Read when the user asks for a different voice direction or wants named presets.

references/input-output-workflow.md

Read when packaging or explaining the complete voice-in / voice-out solution.

Files

5 total

Select a file

Select a file to preview.

Comments

Loading comments…