Local STT (Nvidia Parakeet + Whisper Support)
v1.0.0Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).
Security Scan
OpenClaw
Suspicious
high confidencePurpose & Capability
The code and SKILL.md align with a local STT tool (ffmpeg conversion, ONNX-based Parakeet/Whisper backends). The ability to post transcriptions to a Matrix room matches the documented --room-id option. However, the registry metadata listed no required environment variables while the script clearly expects MATRIX_HOMESERVER and MATRIX_ACCESS_TOKEN when the Matrix feature is used; that mismatch is noteworthy.
Instruction Scope
SKILL.md documents the --room-id option but does not mention that the runtime will: (1) attempt to load environment files from ~/.openclaw/.env and ~/.env, (2) read MATRIX_HOMESERVER and MATRIX_ACCESS_TOKEN from the environment, (3) write logs to /tmp/stt_matrix.log, and (4) load models via onnx_asr which typically pulls model files from network sources (e.g., huggingface). Reading a user's ~/.env is scope-creep because it can surface unrelated secrets; automatic model downloads are network activity not called out in metadata.
Install Mechanism
There is no install spec (instruction-only), which minimizes installer risk. The script includes a commented dependency list and a nonstandard shebang ('uv run --script') indicating runtime packages will be required; this implies runtime package installation/network activity but no explicit installer URL or archive is used.
Credentials
The skill requests no environment variables in registry metadata, yet the script loads ~/.openclaw/.env and ~/.env and reads MATRIX_HOMESERVER and MATRIX_ACCESS_TOKEN if present. Automatically loading a user's .env and using tokens is disproportionate unless clearly documented; it increases the chance of accidental use of unrelated secrets. The Matrix access token, if present, will be used to transmit transcriptions to the specified homeserver.
Persistence & Privilege
The skill is not always-enabled and does not request elevated platform privileges. It writes a local log file (/tmp/stt_matrix.log) and temporarily writes a converted WAV file before deleting it, which is reasonable for this CLI. It does not modify other skills or agent-wide configuration.
What to consider before installing
This skill appears to be a legitimate local STT tool, but you should be cautious before installing or using it as-is:
- The script will automatically load ~/.openclaw/.env and ~/.env and may pick up sensitive environment variables. Review the contents of those files first or move secrets elsewhere.
- If you use --room-id (Matrix integration), the script will look for MATRIX_HOMESERVER and MATRIX_ACCESS_TOKEN and will send the transcript to the specified homeserver; provide a minimally-privileged token or avoid the feature if you don't trust the destination.
- The tool uses onnx_asr/huggingface components to load models at runtime; expect network downloads of model weights (possibly large) from external hosts. If you require offline-only operation, ensure required models are pre-provisioned and verify the code's model-loading behavior.
- The script writes a local log (/tmp/stt_matrix.log) containing attempt metadata (URLs and HTTP status codes). Inspect this file for unexpected behavior.
Recommended actions: ask the skill author to update registry metadata to declare required env vars (MATRIX_HOMESERVER, MATRIX_ACCESS_TOKEN) and to explicitly document network/model downloads; or run the skill in an isolated environment (container or VM) with only the minimal credentials you are willing to expose.Like a lobster shell, security has layers — review code before you run it.
Runtime requirements
🎙️ Clawdis
Binsffmpeg
latest
Local STT (Parakeet / Whisper)
Unified local speech-to-text using ONNX Runtime with int8 quantization. Choose your backend:
- Parakeet (default): Best accuracy for English, correctly captures names and filler words
- Whisper: Fastest inference, supports 99 languages
Usage
# Default: Parakeet v2 (best English accuracy)
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg
# Explicit backend selection
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg -b whisper
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg -b parakeet -m v3
# Quiet mode (suppress progress)
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg --quiet
Options
-b/--backend:parakeet(default),whisper-m/--model: Model variant (see below)--no-int8: Disable int8 quantization-q/--quiet: Suppress progress--room-id: Matrix room ID for direct message
Models
Parakeet (default backend)
| Model | Description |
|---|---|
| v2 (default) | English only, best accuracy |
| v3 | Multilingual |
Whisper
| Model | Description |
|---|---|
| tiny | Fastest, lower accuracy |
| base (default) | Good balance |
| small | Better accuracy |
| large-v3-turbo | Best quality, slower |
Benchmark (24s audio)
| Backend/Model | Time | RTF | Notes |
|---|---|---|---|
| Whisper Base int8 | 0.43s | 0.018x | Fastest |
| Parakeet v2 int8 | 0.60s | 0.025x | Best accuracy |
| Parakeet v3 int8 | 0.63s | 0.026x | Multilingual |
openclaw.json
{
"tools": {
"media": {
"audio": {
"enabled": true,
"models": [
{
"type": "cli",
"command": "~/.openclaw/skills/local-stt/scripts/local-stt.py",
"args": ["--quiet", "{{MediaPath}}"],
"timeoutSeconds": 30
}
]
}
}
}
}
Comments
Loading comments...
