Mlx Tts

v0.1.0

基于 mlx-audio 的本地文本转语音,支持多语言和多模型,输出音频文件限于指定路径,无需 API 密钥。

0· 135·0 current·0 all-time
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
Name/description (local mlx-audio TTS) align with the instructions: SKILL.md describes generating local audio files, limiting outputs to specific paths, and listing models. It does not request unrelated credentials or system-wide access.
Instruction Scope
Instructions are focused on TTS operations (generate/status/reload). They reference installing the Python library via `uv tool install mlx-audio` and model downloads to the HuggingFace cache. The doc asks you to check server logs and openclaw.json for config; these are reasonable for a local service but imply the agent will read local config/logs and perform network downloads for models — this is expected but worth noting.
Install Mechanism
There is no formal install spec in the registry (instruction-only). The SKILL.md instructs running `uv tool install mlx-audio --prerelease=allow`. That command may install Python code (and run arbitrary package install hooks) depending on the environment. This is proportionate for a Python-based TTS plugin but carries the usual risks of installing third-party packages (supply-chain / arbitrary code execution) and assumes a `uv` tool manager is available.
Credentials
The skill requests no environment variables or credentials. It will use local disk paths (~/.cache/huggingface/hub/, ~/.openclaw/voice/outputs/, /tmp) and network access to download models — those are expected for model-backed local TTS. No unexpected credentials are required, but model downloads and large storage/memory needs are implied.
Persistence & Privilege
The skill does not request always:true and is user-invocable only. It does not declare any installation that modifies other skills or system-wide configs beyond the plugin's own openclaw.json entry; this is proportionate to its purpose.
Assessment
This skill appears to be what it says: a local mlx-audio-based TTS helper. Before installing/using it, consider: 1) The SKILL.md expects you to run `uv tool install mlx-audio` — verify what that command will do in your environment and prefer installing packages inside an isolated virtualenv or container. 2) Models can be large (GBs) and may require substantial RAM and disk; check that your Apple Silicon machine has enough resources for the selected model. 3) Model downloads will use network and store files in ~/.cache/huggingface/hub/; ensure you are comfortable with that and with any licensing for particular models. 4) The skill enforces output-path restrictions and disallows symlinks, which is good; still confirm the paths are acceptable for your workflow. 5) If you need stronger assurance, inspect the mlx-audio package source (and its dependencies) before installing, or run it in an isolated environment. If any part (the `uv` tool, model choices, or claimed Apple Silicon compatibility) seems unclear, ask the maintainer for clarifications or a runnable example before enabling autonomous use.

Like a lobster shell, security has layers — review code before you run it.

latestvk970hy16tf3tjgj2ndhesb42c9833qsv
135downloads
0stars
1versions
Updated 1mo ago
v0.1.0
MIT-0

mlx-tts - 基于 mlx-audio 的文本转语音技能

使用 mlx-audio 将文本转换为语音,完全在 Apple Silicon 上运行,无需 API 密钥。

触发条件

当用户请求以下操作时使用此技能:

  • "朗读这段文字"
  • "把这段话转成语音"
  • "用声音说..."
  • "TTS"
  • "语音合成"

工具:mlx_tts

注意: 本插件依赖 mlx-audio Python 库。使用前请确保已安装:

uv tool install mlx-audio --prerelease=allow

生成语音

{
  "action": "generate",
  "text": "要合成的文本",
  "outputPath": "/tmp/output.mp3",
  "model": "可选:指定模型",
  "langCode": "可选:语言代码 (zh/en/ja 等)",
  "speed": "可选:语速倍数 (1.0 为正常)"
}

参数说明:

  • action: 必须是 "generate"
  • text: 要转换为语音的文本(必需)
  • outputPath: 输出文件路径,限制在 /tmp~/.openclaw/voice/outputs/
  • model: 可选,覆盖默认模型
  • langCode: 可选,语言代码(Kokoro 模型需要)
  • speed: 可选,语速倍数(0.5-2.0)

返回值:

{
  "success": true,
  "outputPath": "/tmp/output.mp3",
  "duration": 2.5,
  "model": "使用的模型名称"
}

检查状态

{
  "action": "status"
}

返回 TTS 服务器状态、加载的模型、启动时间等信息。

重载配置

{
  "action": "reload"
}

无需重启 OpenClaw 即可重载 TTS 配置。

可用模型

模型语言描述内存需求
Kokoro-82M (推荐默认)EN, JA, ZH, FR, ES, IT, PT, HI快速轻量,54 种预设声音~500MB
Qwen3-TTS-0.6BZH, EN, JA, KO 等中文质量优秀,支持声音克隆~2.5GB
Qwen3-TTS-1.7BZH, EN, JA, KO 等声音设计,根据描述生成~16GB+
Chatterbox16 种语言最广泛的语言覆盖~16GB+
CSM-1BEN对话式语音,支持声音克隆~2GB
Dia-1.6BEN对话-focused TTS~4GB
Spark-TTS-0.5BEN, ZH高效 TTS~1GB
Soprano-1.1-80MEN高质量轻量 TTS~200MB
OuteTTS-0.6BEN高效 TTS~1.5GB
Ming-omni-0.5B (Dense)EN, ZH轻量 MoE,声音克隆~1GB
Ming-omni-16.8B (BailingMM)EN, ZHMoE 多模态,语音/音乐/事件~32GB+

CLI 命令

命令描述
/mlx-tts status查看 TTS 服务器状态
/mlx-tts test <文本>测试生成语音
/mlx-tts reload重载 TTS 配置
/mlx-tts models列出可用模型

使用示例

基础用法

{
  "action": "generate",
  "text": "你好,我是你的 AI 助手"
}

指定输出路径

{
  "action": "generate",
  "text": "欢迎使用 OpenClaw",
  "outputPath": "~/.openclaw/voice/outputs/welcome.mp3"
}

使用特定模型和语言

{
  "action": "generate",
  "text": "Hello, this is a test",
  "model": "mlx-community/Kokoro-82M",
  "langCode": "en"
}

调整语速

{
  "action": "generate",
  "text": "慢慢朗读这段话",
  "speed": 0.8
}

注意事项

  • 首次生成较慢:模型需要预热,首次请求可能需要几秒
  • 完全本地:所有处理在本地完成,数据不出机器
  • 路径限制:输出路径必须在 /tmp~/.openclaw/voice/outputs/
  • 符号链接检查:输出路径中的符号链接会被拒绝
  • 文件大小限制:超过 64MB 的音频会被拒绝

配置

openclaw.json 中配置:

{
  "plugins": {
    "entries": {
      "openclaw-mlx-audio": {
        "config": {
          "tts": {
            "enabled": true,
            "model": "mlx-community/Qwen3-TTS-12Hz-0.6B-Base-bf16",
            "port": 19280,
            "langCode": "zh",
            "pythonEnvMode": "managed"
          }
        }
      }
    }
  }
}

故障排除

TTS 服务器未启动

检查状态:

/voice-tts status

如果显示未运行,检查配置中的 enabled 是否为 true

生成失败

  1. 检查文本是否为空
  2. 检查输出路径是否合法
  3. 查看服务器日志

模型下载慢

模型首次使用会下载到 ~/.cache/huggingface/hub/,可以使用镜像加速。

Comments

Loading comments...