Voice Chat Bridge

ReviewAudited by ClawScan on May 10, 2026.

Overview

This skill matches its voice auto-reply purpose, but it deserves review because it persistently changes agent behavior, automatically sends chat replies, and uses shell/script execution with dynamic message content.

Install only if you want an always-on voice auto-responder. Before enabling the SOUL.md trigger, restrict it to trusted channels or users, avoid shell-style interpolation for reply text, consider local TTS for sensitive chats, and review the external dependencies and model downloads.

Findings (5)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

A crafted voice message or generated reply containing shell-special characters could cause unsafe command execution if the agent copies these commands directly, and the skill may send replies automatically in connected chats.

Why it was flagged

The persistent workflow directs the agent to run command-line scripts with dynamic values and then send the generated media. If implemented through a shell without safe argument passing, untrusted paths or generated text could alter command behavior; the send action also mutates a chat account automatically.

Skill content
执行脚本 `/root/.openclaw/workspace/skills/voice-chat/scripts/transcribe_audio.py <audio_path>` ... 执行脚本 `/root/.openclaw/workspace/skills/voice-chat/scripts/reply_with_tts.py "<reply_text>"` ... 使用 `message.send` 工具
Recommendation

Use a reviewed wrapper or subprocess argument arrays instead of shell interpolation, pass reply text via stdin or a temporary file, validate audio paths, and add channel/user allowlists or approval gates before sending.

What this means

After setup, the agent may keep responding to future voice messages automatically until the persistent instruction is removed or changed.

Why it was flagged

The skill asks the user to persistently modify the agent's global instructions so future voice/audio messages trigger autonomous processing.

Skill content
编辑 `~/.openclaw/workspace/SOUL.md`,在末尾添加以下内容:... 当收到语音或音频消息时 ... **立即**执行以下自动化流程
Recommendation

Add clear enable/disable instructions, scope the trigger to trusted channels or users, and document how to remove the SOUL.md automation.

What this means

Voice handling may run under the same session context and permissions as the active agent.

Why it was flagged

A helper can reuse the current OpenClaw session ID to invoke the agent for reply generation. This is aligned with the stated context-aware purpose, but it may inherit the current session's authority unless the agent call is model-only or tool-restricted.

Skill content
session_id = os.environ.get('OPENCLAW_SESSION_ID') ... "openclaw", "agent", "--session-id", session_id, "--message"
Recommendation

Prefer a scoped model-completion API for reply generation, or explicitly disable tools and high-impact actions during this helper call.

What this means

Installation depends on external package/model sources and whatever versions satisfy the requirements at install time.

Why it was flagged

The documented setup pulls Python packages and external model archives manually; the artifacts do not show lockfiles, hashes, or signature verification.

Skill content
pip install -r requirements.txt ... wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2 ... tar -xjf
Recommendation

Use pinned dependency versions, provide checksums for model downloads, and prefer a reproducible install spec.

What this means

Generated replies, and possibly transcribed user content if echoed in the reply, may be sent to the TTS provider.

Why it was flagged

The TTS helper sends the generated reply text to the Edge TTS tool; SKILL.md discloses Edge TTS as Microsoft Azure-based. This is expected for cloud TTS, but it is an external provider boundary.

Skill content
cmd = [edge_tts_bin, "--text", text, "--write-media", mp3_path, "--voice", voice]
Recommendation

Avoid using cloud TTS for sensitive conversations, or configure a local-only TTS fallback where privacy is required.