Voice Chat Bridge
ReviewAudited by ClawScan on May 10, 2026.
Overview
This skill matches its voice auto-reply purpose, but it deserves review because it persistently changes agent behavior, automatically sends chat replies, and uses shell/script execution with dynamic message content.
Install only if you want an always-on voice auto-responder. Before enabling the SOUL.md trigger, restrict it to trusted channels or users, avoid shell-style interpolation for reply text, consider local TTS for sensitive chats, and review the external dependencies and model downloads.
Findings (5)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
A crafted voice message or generated reply containing shell-special characters could cause unsafe command execution if the agent copies these commands directly, and the skill may send replies automatically in connected chats.
The persistent workflow directs the agent to run command-line scripts with dynamic values and then send the generated media. If implemented through a shell without safe argument passing, untrusted paths or generated text could alter command behavior; the send action also mutates a chat account automatically.
执行脚本 `/root/.openclaw/workspace/skills/voice-chat/scripts/transcribe_audio.py <audio_path>` ... 执行脚本 `/root/.openclaw/workspace/skills/voice-chat/scripts/reply_with_tts.py "<reply_text>"` ... 使用 `message.send` 工具
Use a reviewed wrapper or subprocess argument arrays instead of shell interpolation, pass reply text via stdin or a temporary file, validate audio paths, and add channel/user allowlists or approval gates before sending.
After setup, the agent may keep responding to future voice messages automatically until the persistent instruction is removed or changed.
The skill asks the user to persistently modify the agent's global instructions so future voice/audio messages trigger autonomous processing.
编辑 `~/.openclaw/workspace/SOUL.md`,在末尾添加以下内容:... 当收到语音或音频消息时 ... **立即**执行以下自动化流程
Add clear enable/disable instructions, scope the trigger to trusted channels or users, and document how to remove the SOUL.md automation.
Voice handling may run under the same session context and permissions as the active agent.
A helper can reuse the current OpenClaw session ID to invoke the agent for reply generation. This is aligned with the stated context-aware purpose, but it may inherit the current session's authority unless the agent call is model-only or tool-restricted.
session_id = os.environ.get('OPENCLAW_SESSION_ID') ... "openclaw", "agent", "--session-id", session_id, "--message"Prefer a scoped model-completion API for reply generation, or explicitly disable tools and high-impact actions during this helper call.
Installation depends on external package/model sources and whatever versions satisfy the requirements at install time.
The documented setup pulls Python packages and external model archives manually; the artifacts do not show lockfiles, hashes, or signature verification.
pip install -r requirements.txt ... wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2 ... tar -xjf
Use pinned dependency versions, provide checksums for model downloads, and prefer a reproducible install spec.
Generated replies, and possibly transcribed user content if echoed in the reply, may be sent to the TTS provider.
The TTS helper sends the generated reply text to the Edge TTS tool; SKILL.md discloses Edge TTS as Microsoft Azure-based. This is expected for cloud TTS, but it is an external provider boundary.
cmd = [edge_tts_bin, "--text", text, "--write-media", mp3_path, "--voice", voice]
Avoid using cloud TTS for sensitive conversations, or configure a local-only TTS fallback where privacy is required.
