Local STT (Nvidia Parakeet + Whisper Support)

PassAudited by VirusTotal on May 12, 2026.

Overview

Type: OpenClaw Skill Name: local-stt Version: 1.0.0 The skill is designed for local speech-to-text and includes an optional feature to send transcriptions to a Matrix room. This involves reading `MATRIX_HOMESERVER` and `MATRIX_ACCESS_TOKEN` from environment variables (potentially from `~/.openclaw/.env` or `~/.env`) and making an outbound network request to a Matrix homeserver. This behavior, including the use of `ffmpeg` for audio conversion, is explicitly documented in `SKILL.md` and the `scripts/local-stt.py` docstring, and is aligned with the skill's stated purpose. There is no evidence of intentional harmful behavior, such as exfiltrating unrelated sensitive data, establishing persistence, or malicious prompt injection.

Findings (0)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

First use may fetch and run external packages needed for transcription, so the user relies on package registry provenance and current package versions.

Why it was flagged

The script is designed to run through uv and resolve Python dependencies at runtime, but the packages are not version-pinned in the artifact.

Skill content
#!/usr/bin/env -S uv run --script
# dependencies = [
#     "onnx-asr",
#     "onnxruntime",
#     "huggingface_hub",
#     "click",
#     "requests",
# ]
Recommendation

Declare uv as a requirement, pin dependency versions or include a lockfile, and document that models/packages may be downloaded on first use.

What this means

If configured, the skill can post transcriptions to Matrix rooms accessible to that token.

Why it was flagged

When Matrix delivery is used, the script uses a Matrix access token to act on the user's Matrix account, although registry metadata declares no primary credential or required environment variables.

Skill content
homeserver = os.environ.get("MATRIX_HOMESERVER")
access_token = os.environ.get("MATRIX_ACCESS_TOKEN")
headers = {"Authorization": f"Bearer {access_token}"}
Recommendation

Document the optional Matrix credential contract clearly, use the least-privileged token available, and only provide --room-id for rooms where posting transcripts is intended.

What this means

Audio content that may have been expected to stay local can leave the device and become visible in the selected Matrix room.

Why it was flagged

With --room-id, the transcribed audio text is sent to an external Matrix room via the Matrix REST API.

Skill content
payload = {
    'msgtype': 'm.text',
    'body': f'🎙️ {text}',
...
}
resp = requests.put(url, headers=headers, json=payload, timeout=10)
Recommendation

Use Matrix sending only intentionally, avoid sending sensitive audio transcripts to shared rooms, and consider adding an explicit confirmation or clearer documentation for this mode.