Skillv2.0.1

ClawScan security

Voice TTS/ASR · ClawHub's context-aware review of the artifact, metadata, and declared behavior.

Scanner verdict

SuspiciousMar 30, 2026, 5:50 AM

Verdict: suspicious
Confidence: medium
Model: gpt-5-mini
Summary: The skill mostly matches a TTS/ASR toolset, but there are inconsistencies (missing Python wrapper scripts referenced at runtime) and a few implementation choices that warrant caution before installation.
Guidance: Before installing: 1) Verify the package includes the Python wrapper scripts referenced at scripts/whisper and scripts/edge_tts — they are referenced but not present in the provided files; without them ASR/TTS calls will fail. 2) Be aware install.sh will pip install edge-tts/whisper and download a large Whisper model (~hundreds of MB) from the network — plan disk space and network usage. 3) The skill reads ~/.openclaw/openclaw.json to obtain Telegram bot tokens; ensure that file is trustworthy and that you are comfortable the skill can access your bot tokens (or prefer to pass --token to send_voice_reply.mjs). 4) Note config parsing uses vm.runInNewContext rather than JSON.parse — this will execute the contents as JS in a VM; only use if you trust your openclaw.json. 5) If you proceed, test in a sandboxed environment first (no sensitive tokens) and confirm TTS/ASR work and that the missing Python wrappers are present/functional. If the wrappers are missing, request the complete package from the author or decline installation.

Review Dimensions

Purpose & Capability: concernThe skill's name/description (Whisper ASR + Edge TTS, Telegram send) aligns with the binaries and Python packages it installs. However multiple JS files call Python wrapper scripts at scripts/whisper and scripts/edge_tts which are referenced by bin/voice-asr.mjs and bin/voice-tts.mjs but are not present in the provided file manifest — that will break runtime behavior and is an incoherence between claimed capability and available files.
Instruction Scope: noteRuntime instructions and scripts perform expected actions: transcribe audio, synthesize MP3, copy/archive inbound files (~~/.openclaw/media/inbound) into the agent workspace, and use curl to POST to Telegram. The skill reads ~/.openclaw/openclaw.json (to get skill config and Telegram tokens) and environment variables (OPENCLAW_WORKSPACE, OPENCLAW_AGENT_ID, TELEGRAM_BOT_TOKEN) — these are relevant to sending messages but mean the skill will access local agent configuration and any Telegram tokens stored there.
Install Mechanism: noteThere is no registry install spec; the provided install.sh installs Python packages (edge-tts, whisper, click) via pip and downloads Whisper models (potentially large, e.g., ~800MB) using whisper.load_model. This is expected for a local Whisper-based ASR but involves network downloads and heavy disk usage. The script uses apt/brew and pip (standard sources) — no arbitrary binary downloads, but the heavy model download and pip installs are significant and should be expected/approved.
Credentials: noteThe skill does not request unrelated credentials, but it reads openclaw.json to locate Telegram bot tokens and will fall back to TELEGRAM_BOT_TOKEN environment variable. That is appropriate for a Telegram sender, but gives the skill access to any bot tokens present in your config. Also config parsing uses vm.runInNewContext instead of JSON.parse, which executes the file content as JS expressions in a VM context — parsing the local config is needed for functionality, but using vm to evaluate user-supplied files increases risk if the config file is untrusted or modified.
Persistence & Privilege: okThe skill does not request always:true nor modify other skills or global system settings. It archives inbound audio into the agent workspace and creates/deletes temporary MP3 files; these behaviors are consistent with its purpose and scoped to its own workspace.