Local Whisper

v1.0.2

Install and use whisper.cpp (local, free/offline speech-to-text) with OpenClaw. Supports downloading different ggml model sizes (tiny/base/small/medium/large...

0· 483·3 current·3 all-time
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The name/description (local whisper.cpp STT for OpenClaw) aligns with the scripts: they build whisper.cpp from the upstream GitHub repo, download ggml model binaries from Hugging Face, install a wrapper into ~/.local/bin, and patch OpenClaw to call the wrapper for inbound audio. One inconsistency: SKILL.md and download_models.sh advertise many model sizes (tiny/base/small/medium/large-*) but the runtime wrapper (bin/openclaw-whisper-stt.sh) enforces MODEL_NAME to be only 'base' or 'small'. This is a capability mismatch (documentation vs runtime).
Instruction Scope
The SKILL.md installation steps are explicit and limited to building whisper.cpp, downloading models into ~/.cache/whisper, installing the wrapper into ~/.local/bin, patching OpenClaw's tools.media.audio config, and restarting the gateway. The scripts operate on user-home directories (~/.local, ~/.cache) and do not attempt to read unrelated system files or export secrets. The patch script will restart the gateway (impactful), which is within the skill's stated goal but is a behavior the user should expect.
Install Mechanism
There is no packaged install spec; the provided scripts clone the known upstream repo (https://github.com/ggerganov/whisper.cpp) and download model binaries from Hugging Face (huggingface.co/ggerganov/whisper.cpp). Those are well-known sources. The build process compiles locally with cmake and installs artifacts under the user's home. This is expected for a local build; the main risk is the usual build-time exposure and disk usage for large models.
Credentials
The skill requests no credentials or secret environment variables. Runtime uses ordinary env items (HOME, optional OPENCLAW_WHISPER_MODEL and OPENCLAW_WHISPER_LANG) and checks for required tools (git, cmake, ffmpeg, curl). No unrelated service tokens or privileged system credentials are requested.
Persistence & Privilege
The skill installs a wrapper symlink into ~/.local/bin, places libs in ~/.local/lib, stores models in ~/.cache/whisper, and PATCHES OpenClaw configuration and restarts the gateway to enable local STT. It does not set always:true, but it does modify OpenClaw's config persistently — users should be aware this changes their gateway behavior until reverted.
Assessment
This skill appears to do what it says: build whisper.cpp locally, download ggml models from Hugging Face, install a wrapper into ~/.local, and configure OpenClaw to call that wrapper for inbound audio. Before installing: (1) ensure you have build tools, ffmpeg, and enough disk space (models can be large); (2) review and back up your OpenClaw config because scripts will patch it and restart the gateway; (3) note the wrapper currently only accepts models 'base' or 'small' despite documentation mentioning larger models — if you plan to use medium/large, inspect/adjust bin/openclaw-whisper-stt.sh; (4) verify you trust the upstream GitHub and Hugging Face sources; (5) run the install commands interactively (not as root) and inspect what they do if you have security concerns. Overall the skill is coherent and proportional to its purpose.

Like a lobster shell, security has layers — review code before you run it.

latestvk97fvjwcy6msq1vvpwhwwp3kth81ya8a
483downloads
0stars
3versions
Updated 1mo ago
v1.0.2
MIT-0

whisper-cpp (Local Whisper STT for OpenClaw)

This skill sets up local whisper.cpp STT for inbound Telegram voice notes.

Dependencies

You need build tools (git, cmake, compiler toolchain) + curl and ffmpeg (to decode Telegram OGG/Opus → WAV).

Install

From this skill directory:

bash scripts/install_whisper_cpp.sh
bash scripts/download_models.sh
bash scripts/install_wrapper.sh
bash scripts/patch_openclaw_audio.sh

Send a Telegram voice note to test.

Tuning

Model choice

This setup uses ggml Whisper models stored in ~/.cache/whisper.

Common model names you can download:

  • tiny, base, small, medium
  • large-v1, large-v2, large-v3 (bigger/slower, usually more accurate)

By default we download: base + small.

To download specific models:

bash scripts/download_models.sh tiny base small

For the OpenClaw wrapper, you can select:

OPENCLAW_WHISPER_MODEL=small openclaw-whisper-stt /path/to/audio
  • Default language: auto-detect (OPENCLAW_WHISPER_LANG=auto)
  • Force a language (example):
    OPENCLAW_WHISPER_LANG=en openclaw-whisper-stt /path/to/audio
    

Models are stored in: ~/.cache/whisper.

Cleanup (optional)

After install (whisper-cli + libs are in ~/.local/):

bash scripts/cleanup_build.sh

Troubleshooting

Confirm OpenClaw is using the wrapper:

which openclaw-whisper-stt
openclaw config get tools.media.audio.models

Test the wrapper directly:

openclaw-whisper-stt /path/to/audio.ogg
OPENCLAW_WHISPER_MODEL=small openclaw-whisper-stt /path/to/audio.ogg

Follow gateway logs while sending a Telegram voice note:

openclaw logs --follow

Files

  • Wrapper source: bin/openclaw-whisper-stt.sh (linked to ~/.local/bin/openclaw-whisper-stt)
  • OpenClaw config patcher: scripts/patch_openclaw_audio.sh

Comments

Loading comments...