Install
openclaw skills install speech-translationBuild, adapt, or run an audio-processing workflow that takes spoken audio, transcribes it with Whisper or faster-whisper, translates the transcript using the current agent model by default, and synthesizes translated speech with Piper, the OpenClaw tts tool, or a mock backend. Use when the user wants 语音转写、翻译、译文语音合成, wants an existing voice translation prototype operationalized, or wants a chat-native flow where sending a voice message automatically yields transcript text, translation text, and translated audio.
openclaw skills install speech-translationUse this skill for two closely related modes:
Default to an LLM-assisted translation workflow: let the current agent produce the translation, save it to a file when using the local pipeline, or use the surrounding agent turn directly when responding in chat.
Use this when an inbound message already contains an audio transcript from OpenClaw media understanding, or when the user asks you to process a voice message conversationally.
tts tool when you need an immediate chat reply with audiofaster-whisper for real transcription, mock for pipeline testing.llm as the default translation path when an agent/model is available.service only when unattended HTTP translation is preferable.manual only as a fallback.piper for real TTS, mock for dry-run testing.llm path, read the transcript and translate it with the current model. Save the translated text to a file.--translation-file.01_transcript.txt02_translation.txt03_translation.wavresult.json--transcript-command, --translation-command, and --audio-command.Use this when the agent handling the task can translate the transcript itself.
translation.txt.bash scripts/run_voice_translate_llm.sh \
/path/to/input.m4a \
./outputs/llm-run \
zh \
en \
/path/to/en_US-lessac-medium.onnx \
./translation.txt \
--whisper-model small \
--transcribe-backend faster-whisper \
--tts-backend piper
Read references/llm-translation-pattern.md when you need the exact orchestration pattern or a reusable translation prompt.
Use this first when you need to validate the pipeline structure without model/runtime dependencies.
python3 scripts/run_voice_translate.py \
--input references/examples/mock-input.txt \
--output-dir ./outputs/mock-run \
--source-lang zh \
--target-lang en \
--transcribe-backend mock \
--translation-file ./translated.txt \
--translation-backend llm \
--no-interactive-translate \
--tts-backend mock \
--piper-model ./dummy.onnx
Notes:
mock transcription reads plain text from the input file.mock TTS writes a silent wav file.--piper-model is still required by the current CLI shape even when using mock TTS; use any placeholder path.llm mode currently means the translation must already exist in --translation-file.python3 scripts/run_voice_translate.py \
--input /path/to/input.m4a \
--output-dir ./outputs/service-run \
--source-lang zh \
--target-lang en \
--whisper-model small \
--transcribe-backend faster-whisper \
--translation-backend service \
--translation-service-url http://127.0.0.1:8000/translate \
--tts-backend piper \
--piper-model /path/to/en_US-lessac-medium.onnx
run_voice_translate.py: primary entrypoint.run_voice_translate_llm.sh: thin wrapper for the default LLM-assisted path.voice_translate_app/: pipeline modules.send_text.py: wrap stage text and forward it via a shell command.send_audio.py: forward generated audio via a shell command.mock_text_sender.py, mock_audio_sender.py: local smoke-test helpers.references/runtime-notes.md for dependency/setup details, backend behavior, and integration constraints.references/llm-translation-pattern.md when the surrounding agent should perform translation with its own model.references/openclaw-chat-mode.md when implementing or following the conversational flow: receive voice, output transcript text, output translation text, then output translated audio.SKILL.md procedural and short.llm as the preferred translation path for agent-driven workflows.tts for immediate conversational audio replies; prefer Piper for local wav artifacts and offline pipelines.