Install
openclaw skills install faster-whisper-local-serviceOpenClaw local speech-to-text backend using faster-whisper over HTTP on 127.0.0.1:18790. Use when you want voice transcription without external APIs, without...
openclaw skills install faster-whisper-local-serviceProvision a local STT backend used by voice skills.
transcribe-server.py HTTP endpoint at http://127.0.0.1:18790/transcribeopenclaw-transcribe.serviceOn first startup, faster-whisper downloads model weights from Hugging Face (~1.5 GB for medium). This requires internet access and disk space. After the initial download, models are cached locally and the service runs fully offline.
| Model | Download size | RAM usage |
|---|---|---|
| tiny | ~75 MB | ~400 MB |
| base | ~150 MB | ~500 MB |
| small | ~500 MB | ~800 MB |
| medium | ~1.5 GB | ~1.4 GB |
| large-v3 | ~3.0 GB | ~3.5 GB |
To pre-download models in an air-gapped environment, see faster-whisper docs.
127.0.0.1 only — not reachable from the network.https://127.0.0.1:8443 by default).MAX_UPLOAD_MB.gst-launch-1.0 are passed as a list — no shell expansion or injection is possible.The service uses GStreamer's decodebin for audio format conversion. Like any media library, GStreamer's parsers process binary data and should be kept up to date. Mitigation: install gst-launch-1.0 from your OS vendor's trusted packages and apply security updates regularly. The magic-byte pre-filter above reduces the attack surface by rejecting non-audio payloads before they reach GStreamer.
TemporaryDirectory and cleaned up immediately.faster-whisper==1.1.1 (override via env)gst-launch-1.0bash scripts/deploy.sh
With custom settings:
WORKSPACE=~/.openclaw/workspace \
TRANSCRIBE_PORT=18790 \
WHISPER_MODEL_SIZE=medium \
WHISPER_LANGUAGE=auto \
TRANSCRIBE_ALLOWED_ORIGIN=https://10.0.0.42:8443 \
bash scripts/deploy.sh
Default: auto (auto-detect language). Set WHISPER_LANGUAGE=de for German-only, en for English-only, etc. Fixed language is faster and more accurate if you only use one language.
Idempotent: safe to run repeatedly.
| What | Path | Action |
|---|---|---|
| Python venv | $WORKSPACE/.venv-faster-whisper/ | Creates venv, installs faster-whisper via pip |
| Transcribe server | $WORKSPACE/voice-input/transcribe-server.py | Writes server script |
| Systemd service | ~/.config/systemd/user/openclaw-transcribe.service | Creates + enables persistent service |
| Model cache | ~/.cache/huggingface/ | Downloads model weights on first run |
systemctl --user stop openclaw-transcribe.service
systemctl --user disable openclaw-transcribe.service
rm -f ~/.config/systemd/user/openclaw-transcribe.service
systemctl --user daemon-reload
Optional full cleanup:
rm -rf ~/.openclaw/workspace/.venv-faster-whisper
rm -f ~/.openclaw/workspace/voice-input/transcribe-server.py
bash scripts/status.sh
Expected:
activewebchat-voice-proxy for browser mic + HTTPS/WSS integration.webchat-voice-full-stack (deploys backend + proxy in order).