Local Whisper

Free local speech-to-text for Telegram and WhatsApp using MLX Whisper on Apple Silicon. Private, no API costs.

MIT-0 · Free to use, modify, and redistribute. No attribution required.
9 · 2.7k · 2 current installs · 2 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The name/description (local Whisper on Apple Silicon) match the code: the project provides an MLX-backed daemon and CLI to transcribe audio locally. However, requirements.txt and transcriber.py also include optional cloud backends (OpenAI, Groq) and their client libraries; those are optional (used only if the packages are installed and API keys are present) but are not declared in SKILL.md as optional cloud-capable behavior. This is plausible design (local-first with optional fallbacks) but worth noting because the README emphasizes 'private, no API costs'.
Instruction Scope
SKILL.md's runtime instructions are focused on installing deps, running a local daemon, and wiring OpenClaw to call the provided CLI. The daemon accepts either raw audio uploads or JSON containing a local file path and will read that path off disk and transcribe it. Accepting file-system paths is required for the intended integration, but it also means the daemon can be directed (by local callers) to open arbitrary files on disk — a potential surprise if untrusted local processes can talk to the daemon. The daemon binds to 127.0.0.1, reducing remote exposure.
Install Mechanism
Install is manual via 'pip3 install -r requirements.txt' (PyPI). No downloads from untrusted URLs or archives. requirements.txt contains platform-conditional MLX package and optional cloud client libraries; pip installation will pull those packages from PyPI.
Credentials
The skill declares no required environment variables, which is accurate for basic local operation. The code uses dotenv and will consume environment variables if present (e.g., OPENAI_API_KEY, GROQ_API_KEY, CLAWD_WHISPER_* for port/backend/model, CLAWD_WHISPER_URL in the CLI). If API keys are present in the environment, the transcriber may use cloud backends, which conflicts with the 'private, no API costs' messaging unless you ensure no cloud keys are set. This is optional behavior but should be explicitly considered by users who expect strict local-only operation.
Persistence & Privilege
The skill is not always-enabled and does not autonomously modify other skills. It provides optional instructions to install a LaunchAgent plist for auto-start, which is normal for a user-installed local daemon. The daemon runs as a user process bound to 127.0.0.1; it does not request elevated privileges or modify other skill configurations automatically.
Assessment
This skill appears to be what it claims: a local, Apple Silicon-optimized Whisper daemon and CLI. Before installing, consider the following: (1) The package will run a local HTTP daemon on 127.0.0.1:8787 — only allow trusted local callers (OpenClaw is intended), and be cautious about other local apps talking to that port. (2) The daemon accepts a JSON field containing a filesystem path and will open that path for transcription; ensure only trusted processes can instruct the daemon to avoid accidental processing of unintended files. (3) requirements.txt includes OpenAI and Groq client libraries; if you have OPENAI_API_KEY or GROQ_API_KEY set in your environment (or in a loaded .env), the code can use cloud backends — remove those env vars or avoid installing those packages if you require strict local-only operation. (4) Install via pip installs third-party packages from PyPI — review requirements.txt and consider using a virtualenv or isolated account. (5) If you want automatic startup, inspect the LaunchAgent plist before copying it into ~/Library/LaunchAgents. Overall this skill is coherent with its stated purpose, but take the above precautions if your threat model requires strict local-only privacy.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.5.0
Download zip
latestvk97cnba99bt5rwkeqs6mtvv8dd80a4e5

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

🎤 Clawdis
Binspython3

SKILL.md

Local Whisper

Transcribe voice messages for free on Telegram and WhatsApp. No API keys. No costs. Runs on your Mac.

The Problem

Voice transcription APIs cost money:

  • OpenAI Whisper: $0.006/minute
  • Groq: $0.001/minute
  • AssemblyAI: $0.01/minute

If you transcribe a lot of Telegram voice messages, it adds up.

The Solution

This skill runs Whisper locally on your Mac. Same quality, zero cost.

  • ✅ Free forever
  • ✅ Private (audio never leaves your Mac)
  • ✅ Fast (~1 second per message)
  • ✅ Works offline

⚠️ Important Notes

  • First run downloads ~1.5GB model — be patient, this only happens once
  • First transcription is slow — model loads into memory (~10-30 seconds), then it's instant
  • Already using OpenAI API for transcription? Replace your existing tools.media.audio config with the one below

Quick Start

1. Install dependencies

pip3 install -r requirements.txt

2. Start the daemon

python3 scripts/daemon.py

First run will download the Whisper model (~1.5GB). Wait for "Ready" message.

3. Add to OpenClaw config

Add this to your ~/.openclaw/openclaw.json:

{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [
          {
            "type": "cli",
            "command": "~/.openclaw/workspace/skills/local-whisper/scripts/transcribe.sh",
            "args": ["{{MediaPath}}"],
            "timeoutSeconds": 60
          }
        ]
      }
    }
  }
}

4. Restart gateway

openclaw gateway restart

Now voice messages from Telegram, WhatsApp, etc. will be transcribed locally for free!

Manual test

./scripts/transcribe.sh voice_message.ogg

Use Case: Telegram Voice Messages

Instead of paying for OpenAI API to transcribe incoming voice messages, point OpenClaw to this local daemon. Free transcription forever.

Auto-Start on Login

cp com.local-whisper.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/com.local-whisper.plist

API

Daemon runs at localhost:8787:

curl -X POST http://localhost:8787/transcribe -F "file=@audio.ogg"
# {"text": "Hello world", "language": "en"}

Translation

Any language → English:

./scripts/transcribe.sh spanish_audio.ogg --translate

Requirements

  • macOS with Apple Silicon (M1/M2/M3/M4)
  • Python 3.9+

License

MIT

Files

8 total
Select a file
Select a file to preview.

Comments

Loading comments…