Install
openclaw skills install qwen-asr-localLocal speech-to-text using Qwen3-ASR (CPU-only, no API key, no cloud). Use when: (1) a voice message or audio file needs transcription, (2) user asks to transcribe audio, (3) speech-to-text is needed. Supports offline, segmented, and streaming modes. macOS and Linux only.
openclaw skills install qwen-asr-localLocal, CPU-only speech-to-text powered by Qwen3-ASR. No API key or cloud needed.
Run the install script to download the pre-built binary and model:
bash {baseDir}/scripts/install.sh
This will:
qwen-asr binary for your platform from GitHub Releasesqwen3-asr-0.6b model (~1.5 GB) from HuggingFacebash {baseDir}/scripts/transcribe.sh <audio-file>
Supports any audio format: wav, mp3, m4a, ogg, flac, opus, webm, aac, etc.
Non-WAV files are automatically converted via ffmpeg (must be installed).
Or call qwen-asr directly (WAV only):
qwen-asr -d ~/.openclaw/tools/qwen-asr/qwen3-asr-0.6b -i <audio-file> --silent
cat audio.wav | qwen-asr -d ~/.openclaw/tools/qwen-asr/qwen3-asr-0.6b --stdin --silent
| Flag | Description |
|---|---|
--silent | Print only transcription text (no progress) |
--language <lang> | Force language (e.g., zh, en) |
-S <seconds> | Segmented mode — split audio into chunks |
--stream | Streaming mode — process audio in real time |
--stdin | Read audio from stdin |
Default model directory: ~/.openclaw/tools/qwen-asr/qwen3-asr-0.6b