Skillv1.0.8

ClawScan security

STT Recognizer | STT 识别器 · ClawHub's context-aware review of the artifact, metadata, and declared behavior.

Scanner verdict

BenignApr 29, 2026, 2:09 PM

Verdict: benign
Confidence: high
Model: gpt-5-mini
Summary: The skill's code, instructions, and requirements match its stated purpose (local and API-based speech-to-text); nothing requests unrelated credentials or installs unexpected third‑party tools, but be aware it downloads large models and can send audio to an API if you enable that mode.
Guidance: This skill appears to do what it says: record from your microphone and transcribe locally or via an OpenAI‑compatible API. Before installing and running it: - If you plan to use API mode, only set STT_API_URL/STT_API_KEY for a trusted provider — audio will be uploaded to that endpoint. Keep keys secret. - The Python requirements include torch and Whisper implementations; install in a virtualenv/conda environment rather than system Python to avoid altering system packages (the quickstart suggests --break-system-packages which can be disruptive). - Model downloads are large (hundreds of MB to multiple GB) and will be stored under ~/.cache/huggingface/modules/stt-recognizer — ensure you have disk space and bandwidth. - The scripts access your microphone and save recordings under ~/.openclaw/workspace/projects/stt-recognizer/recordings (privacy consideration). If you want to avoid saving raw audio, inspect/modify scripts to change behavior. - Run the code in an isolated environment (virtualenv, container) if you do not fully trust the source, and review the included scripts (they are small and readable) before supplying credentials or running downloads. If you want, I can extract the exact places where audio is saved and where network calls occur, or help craft a safer installation command (virtualenv + pip) and show how to run API mode without persisting raw files.

Review Dimensions

Purpose & Capability: okName/description describe an STT tool. Included scripts (record_audio, transcribe, download_models, record_and_transcribe) and requirements (faster-whisper/whisper/openai, audio libraries, torch) are consistent with local transcription and optional API-based transcription.
Instruction Scope: noteSKILL.md and scripts instruct recording from the microphone, saving recordings under the workspace, downloading Whisper models into ~/.cache/huggingface/modules/stt-recognizer, and optionally sending audio to an OpenAI-compatible API when the user provides STT_API_URL/STT_API_KEY. These behaviors are expected for an STT skill, but note that enabling API mode transmits audio externally and the quick-start uses a system-wide pip install flag (--break-system-packages) which may modify system packages.
Install Mechanism: noteThere is no packaged installer; the skill is instruction- and script-based. The provided download_models.sh calls faster_whisper.download_model to fetch model weights (expected behavior). This will download large model files (hundreds of MB to >1GB) into the user's cache directory and write them to disk — expected but resource-intensive. No suspicious external shorteners or unknown install URLs are used.
Credentials: okNo required credentials are declared in registry metadata. The skill documents optional environment variables (OPENCLAW_WORKSPACE, STT_MODEL_PATH, STT_API_URL, STT_API_KEY) that are reasonable for an STT tool. Requesting an API key only makes sense when the user opts into API mode; there are no unrelated secret requests.
Persistence & Privilege: okalways is false and the skill does not request elevated or global agent privileges. It writes models and outputs to user-local cache and workspace directories (normal for ML workloads) and does not modify other skills or system-wide agent config.