turbo-whisper-local-stt

v1.0.6

当用户想要**音频转文字**、**语音转文本**、**转录录音**、**生成字幕**、**会议录音转文字**、**语音笔记转文本**、**本地转录音频**时自动触发。使用本地 Faster-Whisper（large-v3-ct2 等模型）进行高性能、中文优先的音频转文字，完全离线、隐私安全，支持 wav/mp...

⭐ 0· 248·0 current·0 all-time

by顶尖王牌程序员@wangminrui2022

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for wangminrui2022/turbo-whisper-local-stt.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "turbo-whisper-local-stt" (wangminrui2022/turbo-whisper-local-stt) from ClawHub.
Skill page: https://clawhub.ai/wangminrui2022/turbo-whisper-local-stt
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required binaries: python
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install turbo-whisper-local-stt

ClawHub CLI

Package manager switcher

npx clawhub@latest install turbo-whisper-local-stt

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

Name/description (local Faster-Whisper offline STT) matches the code and runtime behavior: scripts create a venv, install required Python packages, download Faster‑Whisper models from Hugging Face, and transcribe audio files. Required binary (python) and file writes (models, logs, outputs) are appropriate for transcription.

ℹ

Instruction Scope

SKILL.md and scripts limit actions to audio transcription, path handling, venv creation, dependency installation, model download, and GPU detection (nvidia-smi). This stays within expected scope, but the runtime will: (1) create a virtualenv in a parent-level venv directory, (2) run many pip installs, and (3) download large model files from Hugging Face if not provided locally — all of which are side effects the user should expect. There is no evidence the skill reads unrelated secrets or exfiltrates data.

ℹ

Install Mechanism

There is no packaged install spec, but the code performs runtime installation via pip and uses huggingface_hub.snapshot_download to fetch models. Installing torch (and other audio libs) and downloading wheels from download.pytorch.org and PyPI/Tsinghua mirror is expected but involves network activity and large downloads. The installer supports git+/.zip/.whl fallbacks (arbitrary package sources), which is powerful but also increases the potential blast radius if a malicious spec were introduced later.

✓

Credentials

The skill does not request credentials or environment variables. It probes system GPU info (nvidia-smi) and writes logs, model caches, and virtualenv files to disk. These accesses are proportionate to GPU-aware local transcription and model caching behavior.

ℹ

Persistence & Privilege

The skill does not request 'always' privilege and is user-invocable. It persists by creating a virtualenv (VENV_DIR) and caching downloaded models and logs under the skill root (or parent venv path). That persistent storage is normal for this use case but can consume significant disk space and may be shared across runs.

Assessment

Before installing/running: - Expect initial network activity: the scripts will pip install packages (including torch) and download models from Hugging Face and PyPI. If you need fully offline operation, pre-download the model and pass --model_path. - Check Python version: the code enforces Python 3.10–3.12 and will exit otherwise. - Disk usage: model files and a virtualenv can be multi-GB; ensure you have space and choose an appropriate output/model path. - Sandbox if possible: because the skill runs pip installs and executes subprocesses, run it in an isolated environment (VM/container) or review/approve the code before giving it access to important systems. - Paths and defaults: note the scripts create a venv in a parent-level venv directory and a default model_path set to a Windows D:/ path — adjust paths for your environment. - No credentials requested: the skill does not ask for API keys or tokens; it uses public Hugging Face downloads. If you intend to use private models, do not supply credentials unless you trust the code. Overall this package appears coherent and appropriate for local STT, but it performs network downloads and installs software automatically — treat those side effects as part of the installation risk and proceed accordingly.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

Binspython

latestvk978ntnkzs7zt8hpq5k1rcm93n856svq

248downloads

0stars

7versions

Updated 1w ago

v1.0.6

MIT-0

Turbo-Whisper-Local-STT

功能：本地高性能音频转文本工具，使用 Faster-Whisper large-v3-ct2 模型。支持中文优先、长音频 VAD 分段、GPU 加速（int8_float16），完全离线隐私安全。特别适合会议录音、语音笔记、视频字幕等中文音频场景。

触发时机（Triggers）

用户提供音频文件（.wav、.mp3、.m4a 等）或音频文件夹路径，并表达转文字、转录、生成字幕等意图。
用户说“帮我转录”“语音转文本”“音频转文字”等口语。
支持单个文件或整个文件夹批量处理。

支持的模型（推荐顺序）

faster-whisper-base-ct2 → 默认推荐（低配设备 / 追求极速）
faster-whisper-large-v3-ct2 → 高精度需求 / 会议转录
faster-whisper-large-v3-turbo-ct2 → 性能与精度的平衡点

参数提取指南

当决定调用此技能时，请从用户消息中准确提取以下参数：

<音频路径> (必填): 用户提供的音频文件路径或文件夹路径（支持相对/绝对路径）。
<输出目录> (选填): 用户指定的输出文件夹。若未指定，默认在输入文件同级目录生成 [源文件名].json 和 .txt。
<language> (选填): 明确指定语言时使用（如 zh、en），默认自动检测但优先中文。
<model_path> (选填): 用户指定特定模型路径。
<output> (选填): 输出格式（json 或 text），默认两者都生成。
其他可选参数（如 --beam_size、--separator）根据用户需求添加。

执行步骤

解析路径：识别用户的音频文件或文件夹路径。
默认目标：若未指定输出路径，默认在输入同级创建 [源文件名].json/.txt 文件。

调用命令：使用以下兼容性命令启动脚本（优先 python3，失败则 python）。脚本会自动创建虚拟环境、检测 GPU 并安装对应版本。

(python3 scripts/transcribe.py --audio_path "<音频路径>" [--output_dir "<输出目录>"] [--language <zh/en>] [--model_path "<模型路径>"] [--output <json/text>] [--beam_size 5] [--separator " "]) || (python scripts/transcribe.py --audio_path "<音频路径>" [--output_dir "<输出目录>"] [--language <zh/en>] [--model_path "<模型路径>"] [--output <json/text>] [--beam_size 5] [--separator " "])

Comments

Loading comments...