Install
openclaw skills install local-voice-replyLocal OPUS/Ogg voice-reply pipeline for Feishu/Discord with structured voice customization. Default voice is Juno (`voice/juno_ref.wav`), with support for re...
openclaw skills install local-voice-replyUse this skill to turn text into a cloned/custom-voice audio reply and deliver it reliably to Feishu or Discord.
juno (reference file: voice/juno_ref.wav).voice/juno_ref.wav.POST /voice/register, then call by voice_name..opus (Ogg container) under .openclaw/media/outbound/voice-server-v3/ (or TARVIS_VOICE_OUTPUT_DIR).scripts/send_voice_reply.ps1 (server API path)scripts/generate_cuda_voice.ps1 (stable local CUDA generation path)Server implementation is kept with the skill (not workspace root):
server/voice_server_v3.py (FastAPI routes)server/voice_engine.py (generation and cache engine)Voice assets are also colocated with the skill:
voice/ffmpeg must be installed and available on PATH (required for Opus encoding).fastapiuvicornpython-multipartchatterbox-ttstorchtorchaudionumpyChatterboxTTS.from_pretrained() may download model assets, so initial run can require network access and additional disk.TARVIS_VOICE_OUTPUT_DIR to override where generated Opus files are written.TARVIS_VOICE_DEVICE to force device selection (cuda/gpu, mps, or cpu).POST /voice/register are persisted under server/voices/.server/voice_cache/..openclaw/media/outbound/voice-server-v3/ by default (or TARVIS_VOICE_OUTPUT_DIR when set).POST /output/cleanup only deletes staged .opus files inside the configured output directory and their .json sidecar files.python -m uvicorn --app-dir server voice_server_v3:app --host 127.0.0.1 --port 8000/speak with text (and optional speed, exaggeration, cfg).
voice_name defaults to juno.audio/ogg) in Juno voice.C:\Users\hanli\.openclaw\media\outbound\message tool:
action=sendfilePath=<allowed-path>asVoice=truechannel=feishuchannel=discordvoice/juno_ref.wav with your target reference voice sample.voice_name=juno.POST /voice/register with a reference sample and target voice_name.server/voices/.voice_name in /speak or /speak_stream.voice_name: junospeed: 1.2/speak (no post-conversion)asVoice=truetorch.inference_mode() during synthesis to reduce overhead./speak and /speak_stream.LocalMediaAccessError ... path-not-allowed.openclaw/media/outbound before sending.Use scripts/send_voice_reply.ps1 to generate Opus directly with defaults (voice_name=juno, speed=1.2).
It auto-selects /speak_stream for longer text (or when -Stream is passed) for better throughput.
For stable CUDA generation command patterns under stricter exec approval policies, use:
scripts/generate_cuda_voice.ps1 -Text "..."
This keeps the outer command shape fixed so allow-always is more reusable.