xeon_tts

Local TTS skill using OpenVINO Qwen3-TTS for voice cloning and emotion style synthesis, supporting QQBOT workflows with strict audio length and file retentio...

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 0 · 77 · 0 current installs · 0 all-time installs

by@aurora2035

duplicate of @aurora2035/xeonasr

MIT-0

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

✓

Purpose & Capability

Name/description (local OpenVINO Qwen3-TTS for QQBOT workflows) align with the included scripts and server: node workflow gateway, Flask TTS service, config injection into OpenClaw. The scripts only modify channels.qqbot.xeonTts and back up the OpenClaw config before writing, which matches the stated intent.

ℹ

Instruction Scope

SKILL.md and scripts instruct the agent to install Python envs, system packages, register user-level systemd services, modify ~/.openclaw/openclaw.json, download models, and save uploaded audio to local references/ and outputs/. Those actions are within the described TTS workflow, but they include writing to user config, enabling autostart services, and making network calls — all of which are beyond a pure 'instruction-only' skill and should be permitted explicitly by the operator.

Install Mechanism

Installation downloads and executes software: Miniconda installer from repo.anaconda.com (expected), pip install of xdp-tts-service (PyPI by default), and model downloads via the Hugging Face CLI. Notably, setup_env.sh exports HF_ENDPOINT=https://hf-mirror.com and HF_HUB_ENABLE_HF_TRANSFER=0, which redirects HF operations to a third‑party mirror by default — that is unexpected and raises supply-chain risk. Model downloads and pip installs run code that will be written to disk and executed; review sources before proceeding.

✓

Credentials

The skill does not declare or require any secret environment variables or cloud credentials. It does accept optional environment overrides (BASE_MODEL_REPO, CUSTOM_MODEL_REPO, XDP_TTS_PIP_SPEC, etc.) used to control model and package locations; these are appropriate for the task but are not enforced as required secrets.

ℹ

Persistence & Privilege

The installer registers and enables two user-level systemd services (TTS Flask and Node gateway) and modifies the user's OpenClaw config (with a timestamped backup). Persistent autostart is justified for a local service but increases blast radius; the skill does not set always:true, but enabling services and writing config are persistent privileges the operator should approve.

What to consider before installing

This package looks functionally consistent with a local TTS/voice-clone skill, but you should not install it blindly. Before installing: 1) Inspect the xdp-tts-service package source (pip package) to verify it is trustworthy; 2) Check the model download sources — setup_env.sh sets HF_ENDPOINT to https://hf-mirror.com by default (an unexpected third‑party mirror). If you don't trust that mirror, override HF_ENDPOINT or ensure hf CLI points to official huggingface.co repos; 3) Review the full server.js (it spawns child processes and handles uploads) to see which external commands (ffprobe, etc.) it executes; 4) Be aware the installer creates user systemd services and will enable/start them and writes into ~/.openclaw/openclaw.json (a backup is created). If you want to reduce risk, run install steps in an isolated VM/container or run setup_env.sh with --skip-deps and inspect generated config files before enabling services. If anything looks unfamiliar (unknown HF mirror, unexpected pip package, or surprising exec calls), do not enable the services and seek a trusted upstream source.

✗

server.js:442

Shell command execution detected (child_process).

✗

self_check.sh:41

Dynamic code execution detected.

.clawhub.json:13

Install source points to URL shortener or raw IP.

config.example.json:3

Install source points to URL shortener or raw IP.

server.js:37

File read combined with network send (possible exfiltration).

Patterns worth reviewing

These patterns may indicate risky behavior. Check the VirusTotal and OpenClaw results above for context-aware analysis before installing.

Like a lobster shell, security has layers — review code before you run it.

Current versionv0.1.4

Download zip

latestvk97deqb4jxtrwff7x4r5wnwy69836qvz

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

SKILL.md

Xeon TTS

基于 OpenVINO Qwen3-TTS Base/Custom 模型的本地语音合成技能，面向 OpenClaw 的 QQBOT 工作流使用。

目标

安装本地双服务：5002 Flask TTS，9002 Node TTS Workflow
自动配置目标机器自己的 OpenClaw 配置，但只写入 channels.qqbot.xeonTts
与 xeonasr 共存，不覆盖 tools.media.audio 或 channels.qqbot.stt
支持两个工作流：音色克隆、指定语气 TTS

什么时候应该调用 xeontts

只有在以下场景才使用 xeontts：

用户明确要“克隆音色”“克隆声音”“复制我的声音”
用户要求“用某种语气朗读/播报/生成语音”
用户要把音频生成到本地文件，而不是做转写

以下场景禁止走 xeontts：

识别语音
语音转文字
听写
STT / ASR

这些请求必须交给 xeonasr，以避免任务冲突。

OpenClaw / QQBOT 使用规则

规则 1：音色克隆必须分两步走

当用户说“我要克隆音色”时：

立即把当前会话切到 clone 流程
回复用户上传 3 到 5 秒参考音频
在收到参考音频前，不要直接开始合成
如果机器上已安装 xeonasr，QQBOT 发来的语音会先命中 ASR；此时应由 ASR 把音频转交给 xeontts，而不是按普通转写处理
收到音频后必须先校验时长
如果时长小于 3 秒或大于 5 秒，直接拒绝并提示重新上传
校验通过后，再让用户发送要朗读的文本
使用 Base 模型生成音频并落盘

规则 2：指定语气生成默认走 Custom 模型

当用户说“用开心的语气朗读……”“生成语音……”时：

解析用户是否指定了语气
如果没有指定，默认使用 普通
使用 Custom 模型生成音频
把结果保存到本地 outputs/
回复用户文件路径

规则 3：必须做时长保护

参考音频：3 到 5 秒
Base 克隆输出：最多约 20 秒
Custom 输出：最多约 30 秒

如果用户显式要求更长时长，或者文本长度估算会明显超过上限，必须直接提示用户拆分内容，而不是盲目提交推理。

规则 4：必须明确告知文件保留期

参考音频和生成结果默认只保留 7 天
超过保留期后，系统会自动清理 references/ 和 outputs/ 中的旧文件
在提示用户上传参考音频、以及告知生成完成时，都要明确告诉用户这一点

安装流程

当 OpenClaw、QQBOT 或其他代理被要求安装这个 skill 时，按下面流程执行：

安装 skill

clawhub install xeontts
cd "$HOME/.openclaw/workspace/skills/xeontts"

执行安装脚本
```
bash install.sh
```
安装脚本会完成这些事情

创建 Python 3.10 环境
从 PyPI 安装 xdp-tts-service
生成 config.json 与 tts_config.json
尝试下载 Base / Custom OV 模型
仅在显式提供旧兼容参数时，才额外下载 Base checkpoint
配置 OpenClaw 的 channels.qqbot.xeonTts
启动 5002 与 9002
注册用户级 systemd 服务
运行 self_check.sh

当前默认模型仓库：

aurora2035/Qwen3-TTS-12Hz-0.6B-Base-OpenVINO-INT8
aurora2035/Qwen3-TTS-12Hz-0.6B-CustomVoice-OpenVINO-INT8

运行时端口

服务	端口	作用
Flask TTS	5002	真正执行 TTS 推理
Node Workflow	9002	解析 QQBOT 任务、维护会话状态、校验音频/文本时长

OpenClaw 配置约定

xeontts 只会写入如下配置块：

{
  "channels": {
    "qqbot": {
      "xeonTts": {
        "enabled": true,
        "baseUrl": "http://127.0.0.1:9002",
        "cloneModel": "qwen3_tts_0.6b_base_openvino",
        "customModel": "qwen3_tts_0.6b_custom_openvino"
      }
    }
  }
}

这意味着：

不会覆盖现有 channels.qqbot.stt
不会动 tools.media.audio
不会和 xeonasr 抢同一条 STT 链路

常用命令

cd "$HOME/.openclaw/workspace/skills/xeontts"

bash start_all.sh
bash stop_tts.sh
bash self_check.sh
curl http://127.0.0.1:5002/api/health
curl http://127.0.0.1:9002/health

关键接口

POST /api/workflow/message
- 作用：根据用户消息判断是 clone 还是 custom TTS，或者提示补充参考音频
POST /api/workflow/reference-audio
- 作用：上传参考音频，校验 3 到 5 秒后入库
POST /api/tts/custom-speak
- 作用：直接调用 Custom 模型生成语音
POST /api/tts/clone-speak
- 作用：直接调用 Base 模型做音色克隆

故障排查

如果 5002 不通，先检查 tts.log
如果 9002 不通，先检查 skill.log
如果参考音频总是被拒绝，先确认机器上是否有可用的 ffprobe；当前版本对 WAV 参考音频也支持无 ffprobe 回退校验
如果用户说的是转写意图，不要误用 xeontts
如果 Base 模型报错，优先让用户更换更干净的 3 到 5 秒参考音频
当前默认发布形态只要求 Qwen3-TTS-12Hz-0.6B-Base-OpenVINO-INT8，不再默认要求原始 Base checkpoint
只有旧导出模型缺少 processor 或 speech tokenizer 权重时，才需要补 BASE_CHECKPOINT_PATH

Files

17 total

Select a file

Select a file to preview.

Comments

Loading comments…