虾转音频

v1.3.1

🎵 音视频格式转换与处理工具箱。基于 FFmpeg + Whisper AI，支持：格式转换、视频提取音频、合并、分割、压缩、查看信息、音频转文字。

⭐ 1· 117·1 current·1 all-time

by@luis1213899

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for luis1213899/xia-zhuan-audio.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "虾转音频" (luis1213899/xia-zhuan-audio) from ClawHub.
Skill page: https://clawhub.ai/luis1213899/xia-zhuan-audio
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install xia-zhuan-audio

ClawHub CLI

Package manager switcher

npx clawhub@latest install xia-zhuan-audio

Security Scan

VirusTotal

Pending

View report →

OpenClaw

Benign

high confidence

ℹ

Purpose & Capability

Name/description state FFmpeg + Whisper and the bundled code (audio-forge.js, menu.js, transcribe.py) implements those features. However the registry metadata at the top of the report lists no required binaries or env vars while SKILL.md and the code clearly require FFmpeg, Python, and (optionally) HF_ENDPOINT and XZA_* environment variables. This is likely a packaging/metadata omission rather than malicious misalignment, but it is an inconsistency to be aware of.

✓

Instruction Scope

SKILL.md and the code limit actions to running ffmpeg, invoking Python/transcribe.py, reading/writing local files provided by the user, and (expected) downloading Whisper models from the HuggingFace endpoint. The instructions do not ask to read unrelated system files or exfiltrate data to unknown endpoints.

ℹ

Install Mechanism

There is no automated install spec; this is instruction + code bundle. The transcribe step depends on the faster-whisper Python package and will cause the Whisper model(s) to be downloaded from the HF_ENDPOINT on first run. Model downloads can be large (MBs–GBs) and are normal for this functionality; downloads come from HuggingFace by default (SKILL.md and transcribe.py default HF_ENDPOINT to https://huggingface.co).

✓

Credentials

The skill does not request access to secrets or unrelated credentials. Declared environment variables (XZA_FFMPEG, XZA_FFPROBE, XZA_SCRIPTDIR, XZA_MODELDIR, HF_ENDPOINT) are reasonable for locating binaries, scripts, and controlling model download source. No other env-vars or credentials are accessed in the code.

✓

Persistence & Privilege

The skill is not marked always:true and does not request elevated or persistent platform privileges. It does not modify other skills or system-wide agent settings. Autonomous invocation remains enabled by default (normal for skills) but is not combined with other concerning factors.

Assessment

This skill appears to do what it claims: FFmpeg-based audio operations plus Whisper transcription. Before installing, ensure you have FFmpeg and Python available and be prepared for first-run downloads of Whisper models (can be hundreds of MBs to multiple GBs) from HuggingFace or a mirror you configure via HF_ENDPOINT. Note the registry metadata omitted required binaries/env — that mismatch is likely a packaging oversight; if you need strict inventory or auditing, ask the author to update the skill metadata. If you have sensitive audio, remember transcription produces local text files; run the skill in a sandbox or test environment if you want to validate behavior before using it on production data.

✗

audio-forge.js:28

Shell command execution detected (child_process).

✗

menu.js:71

Shell command execution detected (child_process).

Patterns worth reviewing

These patterns may indicate risky behavior. Check the VirusTotal and OpenClaw results above for context-aware analysis before installing.

Like a lobster shell, security has layers — review code before you run it.

latestvk979hp6wers25nkqvabwxbryys84nszx

117downloads

1stars

15versions

Updated 2w ago

v1.3.1

MIT-0

虾转音频 (xia-zhuan-audio)

功能列表

#	功能	说明
1	格式转换	m4a→mp3, wav→flac, ape→flac, aac→mp3 等，支持 20+ 格式
2	视频提取音频	mp4/mkv/avi/flv → mp3/aac/wav 等
3	合并音频	将多个音频拼接成一个文件
4	分割音频	按时间范围截取片段
5	压缩音频	减小文件体积（64k/128k/192k）
6	查看音频信息	时长 / 码率 / 采样率 / 声道 / 元数据
7	音频转文字	Whisper AI 自动转录，支持 txt / srt / vtt / json

支持的格式

音频格式： mp3, wav, flac, aac, m4a, ogg, wma, aiff, opus, ape, alac 视频格式： mp4, mkv, avi, mov, flv, wmv, webm

环境变量配置

变量	说明	默认值
`XZA_FFMPEG`	FFmpeg/FFprobe 路径	系统 PATH 中查找
`XZA_FFPROBE`	FFprobe 路径	从 XZA_FFMPEG 推断
`XZA_SCRIPTDIR`	脚本目录	自动检测
`XZA_MODELDIR`	Whisper 模型保存目录	技能目录下的 whisper_models
`HF_ENDPOINT`	HuggingFace 模型下载源	https://huggingface.co（官方）

依赖

FFmpeg — 系统已有或从 https://ffmpeg.org 下载，并在 PATH 中可用
Python — 系统已有（用于音频转文字）
faster-whisper — 运行 pip install faster-whisper 安装

Whisper 模型安全说明

模型通过 HuggingFace 官方源下载（默认），如需使用国内镜像：

set HF_ENDPOINT=https://hf-mirror.com

使用方式

通过 OpenClaw 对话触发（推荐）

直接用自然语言描述需求，例如：

"把这段 m4a 转成 mp3"
"把这个视频的音频提取出来"
"把这几个音频合并成一个"
"把这段音频 1:30 到 2:45 的部分截出来"
"压缩这个音频，128kbps"
"查看这个音频的信息"
"把这段录音转成文字"

通过命令行直接调用

# 格式转换
node audio-forge.js convert <输入> <输出格式> [--bitrate 192k]

# 视频提取音频
node audio-forge.js extract <视频> [输出格式] [--bitrate 192k]

# 合并音频
node audio-forge.js merge <文件1> <文件2> [...] <输出>

# 分割音频
node audio-forge.js split <输入> <开始时间> <结束时间>

# 压缩音频
node audio-forge.js compress <输入> [--quality low|medium|high]

# 查看音频信息
node audio-forge.js info <音频文件>

# 音频转文字（Whisper）
python transcribe.py <音频/视频文件> [--model small] [--language zh] [--format txt] [--device auto]

音频转文字 - Whisper 模型

模型	精度	速度	首次下载
tiny	较低	最快	~75MB
base	标准	快	~150MB
small	良好	较快	~460MB
medium	很好	较慢	~1.5GB
large	最高	最慢	~3GB

安装

openclaw skill install xia-zhuan-audio

或直接把 xia-zhuan-audio 文件夹放入 ~/.openclaw/workspace/skills/ 目录。

创建：2026-04-11 | 作者：@luis12123899

Comments

Loading comments...