{"skill":{"slug":"jimmy-claw-mlx-whisper","displayName":"mlx-whisper","summary":"Set up mlx-whisper as the local audio transcription engine for OpenClaw on Apple Silicon Macs (M1/M2/M3/M4). Automatically transcribes voice notes sent via T...","description":"---\nname: mlx-whisper\ndescription: >\n  Set up mlx-whisper as the local audio transcription engine for OpenClaw on Apple Silicon Macs\n  (M1/M2/M3/M4). Automatically transcribes voice notes sent via Telegram or WhatsApp before the\n  agent processes them. Use when the user wants to enable voice message transcription locally\n  without any API key, or asks to set up Whisper, mlx-whisper, or local speech-to-text in OpenClaw.\n  Apple Silicon only (macOS darwin).\n  Note: Requires internet for initial model download (~465MB), but runs inference locally.\nmetadata:\n  {\n    \"openclaw\":\n      {\n        \"emoji\": \"🎙️\",\n        \"os\": [\"darwin\"],\n        \"requires\": { \"bins\": [\"python3\", \"pip3\"] },\n        \"install\":\n          [\n            {\n              \"id\": \"pip-mlx-whisper\",\n              \"kind\": \"exec\",\n              \"command\": \"pip3\",\n              \"args\": [\"install\", \"mlx-whisper\"],\n              \"label\": \"Install mlx-whisper (Apple Silicon)\",\n            },\n          ],\n      },\n  }\n---\n\n# mlx-whisper — Local Voice Transcription for Apple Silicon\n\nEnables automatic transcription of voice notes in OpenClaw using Apple's MLX framework.\nNo API key required. Works fully offline. ~60× faster than standard Whisper on M1/M2/M3/M4.\n\n## How it works\n\n1. User sends a voice note (Telegram `.ogg` / WhatsApp `.opus`)\n2. OpenClaw downloads the audio file\n3. Passes it to `mlx-whisper-transcribe.sh` via `{{MediaPath}}`\n4. Transcript is injected as the message body\n5. Agent replies to the text content\n\n## Setup\n\n### Step 1 — Install mlx-whisper\n\n```bash\npip3 install mlx-whisper\n```\n\nVerify:\n```bash\npython3 -c \"import mlx_whisper; print('OK')\"\n```\n\n### Step 2 — Install the wrapper script\n\nFind the Python bin path:\n```bash\npython3 -m site --user-base\n# e.g. /Users/<you>/Library/Python/3.9\n```\n\nCopy `bin/mlx-whisper-transcribe.sh` from this skill to `<user-base>/bin/mlx-whisper-transcribe.sh`, then make it executable:\n\n```bash\nPYBIN=$(python3 -m site --user-base)/bin\ncp {baseDir}/bin/mlx-whisper-transcribe.sh \"$PYBIN/mlx-whisper-transcribe.sh\"\nchmod +x \"$PYBIN/mlx-whisper-transcribe.sh\"\n```\n\nTest it:\n```bash\n\"$PYBIN/mlx-whisper-transcribe.sh\" /path/to/audio.ogg\n# First run downloads the model (~465MB). Subsequent runs are instant.\n```\n\n### Step 3 — Configure OpenClaw\n\nAdd to `~/.openclaw/openclaw.json` under `tools.media.audio`:\n\n```json\n{\n  \"tools\": {\n    \"media\": {\n      \"audio\": {\n        \"enabled\": true,\n        \"models\": [\n          {\n            \"type\": \"cli\",\n            \"command\": \"<user-base>/bin/mlx-whisper-transcribe.sh\",\n            \"args\": [\"{{MediaPath}}\"],\n            \"timeoutSeconds\": 60\n          }\n        ]\n      }\n    }\n  }\n}\n```\n\nReplace `<user-base>` with the output of `python3 -m site --user-base`.\n\n### Step 4 — Restart OpenClaw\n\n```bash\nopenclaw gateway restart\n```\n\nOr restart the OpenClaw app from the menu bar.\n\n## Models\n\nThe wrapper uses `whisper-small-mlx` by default (465MB, good balance of speed and accuracy).\nTo change, edit `bin/mlx-whisper-transcribe.sh` and update `path_or_hf_repo`:\n\n| Model | Size | Use case |\n|-------|------|----------|\n| `mlx-community/whisper-tiny-mlx` | 75MB | Fastest, basic accuracy |\n| `mlx-community/whisper-small-mlx` | 465MB | **Recommended** |\n| `mlx-community/whisper-medium-mlx` | 1.5GB | Higher accuracy |\n| `mlx-community/whisper-large-v3-mlx` | 3GB | Best accuracy |\n\n## Language hint (optional)\n\nPass a language code as the second argument to skip auto-detection (faster):\n\n```bash\nmlx-whisper-transcribe.sh audio.ogg zh   # Chinese\nmlx-whisper-transcribe.sh audio.ogg en   # English\n```\n\nIn `openclaw.json`, add the language to args:\n```json\n\"args\": [\"{{MediaPath}}\", \"zh\"]\n```\n\n## Performance (M3 MacBook Pro, 8GB)\n\n| Audio length | Transcription time |\n|-------------|-------------------|\n| 10 sec | ~1 sec |\n| 1 min | ~7 sec |\n| 30 min | ~3.5 min |\n\n## Troubleshooting\n\n- **`mlx_whisper not found`**: Run `pip3 install mlx-whisper` again\n- **Empty transcript**: Audio may be silent or music-only (Whisper transcribes speech only)\n- **Timeout**: Increase `timeoutSeconds` for long audio files\n- **Wrong language**: Add `\"language\": \"zh\"` or the target language code to args\n- **Model download fails**: Check internet connection; models are cached after first run in `~/.cache/huggingface`\n","tags":{"latest":"1.0.7"},"stats":{"comments":0,"downloads":678,"installsAllTime":1,"installsCurrent":1,"stars":0,"versions":8},"createdAt":1773167646759,"updatedAt":1778491811973},"latestVersion":{"version":"1.0.7","createdAt":1773169250748,"changelog":"Force new version to fix scanner cache","license":"MIT-0"},"metadata":{"setup":[],"os":["darwin"],"systems":null},"owner":{"handle":"yinghaojia","userId":"s178ctw1yw1mpyqe81gqd7y1cn84b9ms","displayName":"YinghaoJia","image":"https://avatars.githubusercontent.com/u/23326844?v=4"},"moderation":null}