speech2text

v1.0.0

Automatically converts speech messages in ogg/wav/mp3/m4a formats to text using offline Faster-Whisper with ffmpeg format conversion.

⭐ 0· 101·0 current·0 all-time

by@lqwall26

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for lqwall26/speech2text.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "speech2text" (lqwall26/speech2text) from ClawHub.
Skill page: https://clawhub.ai/lqwall26/speech2text
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install speech2text

ClawHub CLI

Package manager switcher

npx clawhub@latest install speech2text

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

high confidence

Purpose & Capability

Name/description (speech→text using faster-whisper + ffmpeg) aligns with the code. However, SKILL.md and description emphasize 'offline' Faster-Whisper, while the code instantiates WhisperModel(MODEL_SIZE) without bundling model files — that will typically trigger model downloads from the network (e.g., Hugging Face) if weights are not present, contradicting the 'offline' claim. Also SKILL.md and config list only Windows ffmpeg paths; the skill has no OS restriction set, which is an inconsistency.

Instruction Scope

SKILL.md describes converting provided audio attachments; the code also automatically looks for the most recent .ogg in a hardcoded user directory (~/.openclaw/media/inbound) when no attachment is supplied. This automatic local-file scanning is not clearly described and could read unrelated user audio files. The code uses subprocess.run to call ffmpeg (expected) but will modify the subprocess PATH to include Windows ffmpeg locations.

ℹ

Install Mechanism

No install spec (instruction-only), so nothing is fetched/installed by the platform. The code depends on external packages (faster-whisper, pydub) and on model weights—these are not provided and are likely downloaded by the faster-whisper/Hugging Face machinery at runtime, which is network activity not documented in SKILL.md's 'offline' claim.

✓

Credentials

The skill requests no environment variables or credentials and does not require unusual system config access. It does expect ffmpeg to be installed and accessible (and tries Windows-specific paths). It temporarily adjusts PATH for the subprocess but does not persist credentials or require secrets.

✓

Persistence & Privilege

always is false and the skill does not modify other skills or system-wide configs. It can be invoked autonomously (platform default) and SKILL.md suggests automatic triggering on voice messages — combined with its automatic local media scanning, this increases the chance it will read local audio without an explicit attachment, but it does not request elevated or persistent privileges.

What to consider before installing

This skill appears to do what it says (convert audio to text using faster-whisper + ffmpeg) but has a few important caveats to consider before installing: - Offline claim: The SKILL.md says 'offline' but the code calls WhisperModel(MODEL_SIZE) without bundled weights; faster-whisper will typically fetch model weights from the network if they are not already available locally. If you must avoid network/model downloads, preinstall model files and verify the model is loaded offline. - Local file scanning: If no attachment is provided the skill will scan ~/.openclaw/media/inbound and pick the newest .ogg file. If you have sensitive audio in that location, the skill may read it. If you do not want that behavior, either avoid allowing automatic triggers or modify the code to require explicit attachments. - Platform assumptions: The code only checks Windows ffmpeg paths (ffmpeg.exe) and SKILL.md shows a Windows installation path. On Linux/macOS the skill may not find ffmpeg without adjustments. - Dependencies: You must pip install faster-whisper and pydub and have ffmpeg available. Model downloads may consume bandwidth and disk space. Recommendations: - Review the code (included) and, if you need true offline operation, predownload/install the chosen Whisper model and test model loading without network access. - Run the skill in a sandbox or environment where reading ~/.openclaw/media/inbound is acceptable, or patch the code to require explicit attachments only. - Verify ffmpeg is installed on your OS and adapt the ffmpeg path logic for non-Windows systems. - If unsure, treat this as potentially privacy-sensitive and avoid enabling automatic triggers until you validate its behavior.

Like a lobster shell, security has layers — review code before you run it.

latestvk97541b6kcxnnvmwwzkq5y3r6x83kkxn

101downloads

0stars

1versions

Updated 1mo ago

v1.0.0

MIT-0

STT - 语音识别 (Speech-to-Text)

将语音消息识别为文字。支持 ogg/wav/mp3/m4a 格式。

触发方式

用户发送语音消息时自动触发
或者手动调用 skill

功能

自动识别语音 - 收到语音消息时自动转文字
离线识别 - 使用 Faster-Whisper，无需网络
格式转换 - 自动用 ffmpeg 转换音频格式

依赖

Python 包: faster-whisper, pydub
ffmpeg: C:\ffmpeg\bin (需要在系统 PATH 中)

安装

pip install faster-whisper pydub

使用示例

用户发送语音 → 自动识别为文字 → 根据文字内容回复

配置

模型大小: tiny (可改为 base/small/medium/large，精度更高但更慢)
默认语言: zh (中文)
ffmpeg 路径: C:\ffmpeg\bin

原理

接收语音文件 (ogg)
用 ffmpeg 转换为 wav (16000Hz, mono)
用 Faster-Whisper 识别为文字
返回识别结果并继续对话

Comments

Loading comments...