Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Voice TTS/ASR

v2.0.1

语音输入(Whisper ASR)+ 语音输出(Edge TTS)技能,支持 agent 专属音色,可调用 send_voice_reply.mjs 发送 Telegram 语音消息。

0· 511·4 current·4 all-time
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
medium confidence
!
Purpose & Capability
The skill's name/description (Whisper ASR + Edge TTS, Telegram send) aligns with the binaries and Python packages it installs. However multiple JS files call Python wrapper scripts at scripts/whisper and scripts/edge_tts which are referenced by bin/voice-asr.mjs and bin/voice-tts.mjs but are not present in the provided file manifest — that will break runtime behavior and is an incoherence between claimed capability and available files.
Instruction Scope
Runtime instructions and scripts perform expected actions: transcribe audio, synthesize MP3, copy/archive inbound files (~~/.openclaw/media/inbound) into the agent workspace, and use curl to POST to Telegram. The skill reads ~/.openclaw/openclaw.json (to get skill config and Telegram tokens) and environment variables (OPENCLAW_WORKSPACE, OPENCLAW_AGENT_ID, TELEGRAM_BOT_TOKEN) — these are relevant to sending messages but mean the skill will access local agent configuration and any Telegram tokens stored there.
Install Mechanism
There is no registry install spec; the provided install.sh installs Python packages (edge-tts, whisper, click) via pip and downloads Whisper models (potentially large, e.g., ~800MB) using whisper.load_model. This is expected for a local Whisper-based ASR but involves network downloads and heavy disk usage. The script uses apt/brew and pip (standard sources) — no arbitrary binary downloads, but the heavy model download and pip installs are significant and should be expected/approved.
Credentials
The skill does not request unrelated credentials, but it reads openclaw.json to locate Telegram bot tokens and will fall back to TELEGRAM_BOT_TOKEN environment variable. That is appropriate for a Telegram sender, but gives the skill access to any bot tokens present in your config. Also config parsing uses vm.runInNewContext instead of JSON.parse, which executes the file content as JS expressions in a VM context — parsing the local config is needed for functionality, but using vm to evaluate user-supplied files increases risk if the config file is untrusted or modified.
Persistence & Privilege
The skill does not request always:true nor modify other skills or global system settings. It archives inbound audio into the agent workspace and creates/deletes temporary MP3 files; these behaviors are consistent with its purpose and scoped to its own workspace.
What to consider before installing
Before installing: 1) Verify the package includes the Python wrapper scripts referenced at scripts/whisper and scripts/edge_tts — they are referenced but not present in the provided files; without them ASR/TTS calls will fail. 2) Be aware install.sh will pip install edge-tts/whisper and download a large Whisper model (~hundreds of MB) from the network — plan disk space and network usage. 3) The skill reads ~/.openclaw/openclaw.json to obtain Telegram bot tokens; ensure that file is trustworthy and that you are comfortable the skill can access your bot tokens (or prefer to pass --token to send_voice_reply.mjs). 4) Note config parsing uses vm.runInNewContext rather than JSON.parse — this will execute the contents as JS in a VM; only use if you trust your openclaw.json. 5) If you proceed, test in a sandboxed environment first (no sensitive tokens) and confirm TTS/ASR work and that the missing Python wrappers are present/functional. If the wrappers are missing, request the complete package from the author or decline installation.
bin/voice-asr.mjs:60
Shell command execution detected (child_process).
bin/voice-tts.mjs:43
Shell command execution detected (child_process).
scripts/send_voice_reply.mjs:44
Shell command execution detected (child_process).
Patterns worth reviewing
These patterns may indicate risky behavior. Check the VirusTotal and OpenClaw results above for context-aware analysis before installing.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🎙️ Clawdis
Binsnode>=18, python3, ffmpeg
latestvk9784kt8bvk547ghmb90s8jkjs83wqxr
511downloads
0stars
3versions
Updated 20h ago
v2.0.1
MIT-0

voice-tts

语音输入(ASR)+ 语音输出(TTS)技能,完整替代 OpenClaw 内置 tts 工具处理中文内容。

技术概览

方向技术说明
语音 → 文字Whisper(本地)接收语音,自动转文字
文字 → 语音Edge TTS(云端)生成 MP3,发送 Telegram 语音消息

工作方式

ASR(语音 → 文字)

用户发来语音消息 → voice-asr.mjs 转写为文字 → 触发 agent 处理。

语音识别在 OpenClaw 工具层自动完成,agent 收到的是文字。

TTS(文字 → 语音)

agent 回复文字后,如需以语音发送,调用 send_voice_reply.mjs 手动发送 Telegram 语音消息:

node /path/to/voice-tts/scripts/send_voice_reply.mjs \
  --text "你的回复内容" \
  --chat-id 8317347201 \
  --agent main

快速安装

方式一:一键安装(推荐)

# 默认安装 turbo 模型
bash /path/to/voice-tts/install.sh

# 国内加代理
bash /path/to/voice-tts/install.sh --proxy http://127.0.0.1:7897

方式二:手动安装

pip install edge-tts whisper click
brew install ffmpeg   # macOS
sudo apt install -y ffmpeg  # Ubuntu

安装完成后运行冒烟测试:

bash tests/smoke.sh

配置(可选)

config.default.json 已包含所有默认值,不填配置可直接使用

如需自定义 agent 音色映射或 ASR 参数,在 openclaw.jsonskills.entries.voice-tts.config 中覆盖:

{
  "skills": {
    "entries": {
      "voice-tts": {
        "enabled": true,
        "config": {
          "tts": {
            "defaultVoice": "zh-CN-XiaoxiaoNeural",
            "agentVoices": {
              "main":       "zh-CN-XiaoxiaoNeural",
              "researcher": "zh-CN-YunxiNeural",
              "product":    "zh-CN-XiaoyiNeural",
              "coder":      "zh-CN-YunyangNeural",
              "devops":     "zh-CN-YunjianNeural"
            }
          },
          "asr": {
            "defaultInitialPrompt": "以下是中文语音转文字。常见词包括:管家、研究员、邮差、码农、产品、运维、OpenClaw、小爱、Telegram。",
            "defaultTemperature": 0,
            "conditionOnPreviousText": true
          }
        }
      }
    }
  }
}

修改后执行 openclaw gateway restart


核心脚本

语音合成 — bin/voice-tts.mjs

将文字转为语音文件:

# 基本用法
node bin/voice-tts.mjs "你好" -f /tmp/demo.mp3

# 指定 agent 音色
node bin/voice-tts.mjs "你好" -f /tmp/demo.mp3 --agent main

# 指定声音 / 语速
node bin/voice-tts.mjs "你好" -f /tmp/demo.mp3 -v zh-CN-YunxiNeural -r +10%

可用中文音色:zh-CN-XiaoxiaoNeural(女声,推荐)、zh-CN-YunxiNeuralzh-CN-XiaoyiNeuralzh-CN-YunyangNeuralzh-CN-YunjianNeuralzh-CN-XiaomoNeural

语音识别 — bin/voice-asr.mjs

将音频文件转文字:

# 基本用法
node bin/voice-asr.mjs audio.ogg

# 指定模型 / 语言
node bin/voice-asr.mjs audio.ogg --model turbo --language zh

# 输出 JSON(含语言检测)
node bin/voice-asr.mjs audio.ogg --json

可用模型:tiny base small turbo large-v3

发送 Telegram 语音 — scripts/send_voice_reply.mjs

一键完成"文字 → TTS 合成 → Telegram 语音消息发送":

node scripts/send_voice_reply.mjs \
  --text "已收到!" \
  --chat-id 8317347201 \
  --agent main

参数说明:

参数必填说明
--text要语音播报的文字内容
--chat-idTelegram 目标用户 ID
--agentagent id,自动选对应音色
--voice覆盖默认音色,如 zh-CN-YunxiNeural
--rate语速,如 +10%-5%
--token直接指定 Telegram Bot Token

Token 自动查找优先级:

  1. --token 参数
  2. openclaw.json → channels.telegram.accounts.<当前agent>.botToken
  3. openclaw.json → channels.telegram.accounts.default.botToken
  4. 环境变量 TELEGRAM_BOT_TOKEN

文件结构

voice-tts/
├── SKILL.md                      # 本文档
├── config.default.json           # 默认配置(直接可用,不需修改)
├── install.sh                    # 一键安装脚本
│
├── bin/
│   ├── voice-tts.mjs             # TTS 入口
│   └── voice-asr.mjs             # ASR 入口
│
├── lib/
│   ├── config.mjs                # 配置读取(支持 openclaw.json 覆盖)
│   ├── errors.mjs                 # 统一错误码 + 用户兜底消息
│   └── audio.mjs                 # 音频校验
│
├── scripts/
│   ├── send_voice_reply.mjs      # Telegram 语音发送(核心)
│   └── auto_voice_check          # 批量处理未处理语音
│
└── tests/
    └── smoke.sh                   # 冒烟测试

注意: scripts/edge_ttsscripts/whisper 是内部 Python 封装,非直接入口;直接使用上表中的 bin/ 入口即可。


错误码

错误码含义用户兜底消息
no_file_path未提供音频文件抱歉,没有收到音频文件,请重试。
file_not_found文件不存在抱歉,音频文件没找到,请重试。
file_empty文件为空抱歉,音频文件是空的,请重试。
file_too_small文件过小抱歉,音频文件不完整,请重试。
file_stale文件过期抱歉,音频文件已过期,请重试。
transcription_failedWhisper 转写失败抱歉,语音识别失败了,请重试。
synthesis_failedEdge TTS 生成失败抱歉,语音生成失败了,请重试。
timeout执行超时抱歉,处理超时了,请稍后重试。

语音文件自动归档

voice-asr.mjs 成功转写后,自动将原文件从 ~/.openclaw/media/inbound/ 复制到 agent workspace media/inbound/,然后删除原文件。

  • ✅ 成功时:复制归档,删除原文件
  • ❌ 失败时:保留原文件,可重试

故障排查

# 检查依赖
ffmpeg -version
python3 -c "import edge_tts; print('edge-tts ok')"
python3 -c "import whisper; print('whisper ok')"

# 检查未处理语音文件
ls -la ~/.openclaw/media/inbound/

# 直接测试 ASR
node bin/voice-asr.mjs ~/.openclaw/media/inbound/your-file.ogg

# 直接测试 TTS
node bin/voice-tts.mjs "测试" -f /tmp/test.mp3

# 运行冒烟测试
bash tests/smoke.sh

常见问题:

  • TTS 生成失败:检查 python3 -c "import edge_tts; print('ok')"
  • Telegram 发送失败:确认 botToken 正确、chat-id 是数字 ID、语音文件 < 20MB
  • 语音发错对象:检查 conversationId 是否与预期 chat-id 一致

可选:批量处理未处理语音

node scripts/auto_voice_check

检查 ~/.openclaw/media/inbound/ 下未处理的 .ogg 文件,自动转写并归档。

Comments

Loading comments...