Mlx Stt

v0.1.0

本地运行 mlx-audio Whisper 模型,将多格式音频转录为文本,支持自动语言检测和时间戳,无需联网或 API 密钥。

0· 191·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for gandli-2025/openclaw-mlx-stt.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Mlx Stt" (gandli-2025/openclaw-mlx-stt) from ClawHub.
Skill page: https://clawhub.ai/gandli-2025/openclaw-mlx-stt
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install openclaw-mlx-stt

ClawHub CLI

Package manager switcher

npx clawhub@latest install openclaw-mlx-stt
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The skill's name and description match the instructions: it describes local transcription using mlx-audio/Whisper models. However, the SKILL.md requires the mlx-audio Python library (via `uv tool install mlx-audio`) and implies a Python runtime, but the skill metadata does not declare required binaries or environment variables. This is an omission but not inconsistent with the stated purpose.
Instruction Scope
Instructions stay within the STT scope (transcribe/status/reload). They reference local file paths for audio (expected). A few CLI references (e.g., `/voice-stt status`, `/mlx-stt ...`) and a managed pythonEnvMode are mentioned but no metadata lists those tools — the agent or operator should ensure those commands/environments exist before invoking the skill. No instructions request secrets or external data exfiltration.
Install Mechanism
There is no install spec (instruction-only), which is lowest risk. The README instructs running `uv tool install mlx-audio --prerelease=allow` which will download/install a Python package; because the install steps are manual and not part of a packaged installer, verify the provenance of `uv` and `mlx-audio` before running. No suspicious download URLs are present in the SKILL.md.
Credentials
The skill requests no credentials, env vars, or config paths. The declared local-only behavior and lack of secret requirements are proportional to an offline STT capability.
Persistence & Privilege
always is false and the skill is user-invocable; it does not request persistent/global privileges. The config snippet shows it writes its own plugin config under openclaw.json, which is expected and scoped to the plugin.
Assessment
This skill appears to do what it claims (local transcription). Before installing or using it: 1) confirm you have a compatible Python runtime and understand what the `uv` tool is and where it will install packages from; 2) only install `mlx-audio` from a trusted source; 3) ensure the CLI commands referenced (e.g., /mlx-stt, /voice-stt) exist in your environment or that the OpenClaw plugin will provide them; 4) test with non-sensitive audio first and monitor network activity to verify the claim that processing is fully local; 5) if you need the agent to run this autonomously, ensure appropriate safeguards since the skill can access local files you point it to (audioPath). If you want, provide the environment where you will run this (OS, presence of Python/uv) and I can list the exact commands to safely install and verify dependencies.

Like a lobster shell, security has layers — review code before you run it.

latestvk974gjncssxhyqfaewb9178pmh833jm3
191downloads
0stars
1versions
Updated 1mo ago
v0.1.0
MIT-0

mlx-stt - 基于 mlx-audio Whisper 的语音转文本技能

使用 mlx-audio Whisper 模型将音频转录为文本,完全在 Apple Silicon 上运行,无需 API 密钥。

触发条件

当用户请求以下操作时使用此技能:

  • "转录这段音频"
  • "把语音转成文字"
  • "听写这个文件"
  • "STT"
  • "语音识别"
  • "把录音转文字"

工具:mlx_stt

注意: 本插件依赖 mlx-audio Python 库。使用前请确保已安装:

uv tool install mlx-audio --prerelease=allow

转录音频

{
  "action": "transcribe",
  "audioPath": "/path/to/audio.mp3",
  "language": "可选:语言代码 (zh/en 等)",
  "task": "可选:transcribe 或 translate"
}

参数说明:

  • action: 必须是 "transcribe"
  • audioPath: 音频文件路径(必需)
  • language: 可选,语言代码(省略则自动检测)
  • task: 可选,"transcribe"(转录)或 "translate"(翻译成英文)

返回值:

{
  "success": true,
  "text": "转录的文本内容",
  "language": "检测到的语言",
  "duration": 5.2,
  "segments": [
    {
      "start": 0.0,
      "end": 2.5,
      "text": "第一句话"
    }
  ]
}

检查状态

{
  "action": "status"
}

返回 STT 服务器状态、加载的模型等信息。

重载配置

{
  "action": "reload"
}

无需重启 OpenClaw 即可重载 STT 配置。

可用模型

Whisper 系列

模型语言描述内存需求
whisper-large-v3-turbo (推荐默认)99+快速准确,日常使用~2GB
whisper-large-v399+最高准确度~6GB
distil-large-v3EN蒸馏版,更快~1.5GB

Qwen3 系列

模型语言描述内存需求
Qwen3-ASR-0.6BZH, EN, JA, KO 等轻量多语言 ASR~1GB
Qwen3-ASR-1.7BZH, EN, JA, KO 等高精度多语言 ASR~4GB
Qwen3-ForcedAligner-0.6BZH, EN, JA, KO 等词级时间戳对齐~1GB

其他模型

模型语言描述内存需求
Parakeet-TDT-0.6B-v325 EU 语言NVIDIA 高精度~1.5GB
VibeVoice-ASR-9B多语言说话人分离,长音频 (60min)~18GB
Voxtral-Mini-3B多语言Mistral 语音模型~6GB
Canary25 EU + RUNVIDIA 多语言 + 翻译~2GB
MoonshineENUseful Sensors 轻量 ASR~500MB
MMS1000+Meta 超大规模多语言可变
Granite-SpeechEN, FR, DE, ES, PT, JAIBM ASR + 翻译~4GB

CLI 命令

命令描述
/mlx-stt status查看 STT 服务器状态
/mlx-stt transcribe <音频路径>转录音频文件
/mlx-stt reload重载 STT 配置
/mlx-stt models列出可用模型

使用示例

基础转录(自动检测语言)

{
  "action": "transcribe",
  "audioPath": "/tmp/recording.m4a"
}

指定语言

{
  "action": "transcribe",
  "audioPath": "/tmp/chinese_audio.mp3",
  "language": "zh"
}

翻译成英文

{
  "action": "transcribe",
  "audioPath": "/tmp/foreign_audio.mp3",
  "task": "translate"
}

使用特定模型

在配置中指定,或使用时覆盖。

支持的音频格式

  • MP3
  • WAV
  • M4A
  • FLAC
  • OGG
  • WebM
  • MP4(提取音频)

注意事项

  • 完全本地:所有处理在本地完成,数据不出机器
  • 自动语言检测:不指定 language 时自动检测
  • 时间戳:返回结果包含每个片段的时间戳
  • 长音频:支持长音频文件,自动分段处理
  • 背景噪音:Whisper 对背景噪音有一定鲁棒性

配置

openclaw.json 中配置:

{
  "plugins": {
    "entries": {
      "openclaw-mlx-audio": {
        "config": {
          "stt": {
            "enabled": true,
            "model": "mlx-community/whisper-large-v3-turbo",
            "port": 19290,
            "language": "zh",
            "pythonEnvMode": "managed"
          }
        }
      }
    }
  }
}

故障排除

STT 服务器未启动

检查状态:

/voice-stt status

如果显示未运行,检查配置中的 enabled 是否为 true

转录失败

  1. 检查音频文件是否存在
  2. 检查音频格式是否支持
  3. 查看服务器日志

识别准确度低

  • 尝试使用更大的模型(如 whisper-large-v3)
  • 指定正确的语言代码
  • 确保音频质量良好(减少背景噪音)

处理速度慢

  • 使用更小的模型(如 whisper-turbo 或 whisper-small)
  • 缩短音频长度
  • 确保没有其他高负载任务

高级用法

批量转录

可以循环调用 transcribe 处理多个文件。

实时转录

结合音频录制工具,实现近实时的语音转文字。

多语言混合

Whisper v3 支持多语言混合音频的自动检测和转录。

Comments

Loading comments...