video2txt-视频理解字幕提取

v1.0.1

将本地视频或音频文件转写为 SRT 字幕文件和 TXT 纯文本文件

⭐ 0· 137·0 current·0 all-time

by@chentx1243

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for chentx1243/maple-video2txt.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "video2txt-视频理解字幕提取" (chentx1243/maple-video2txt) from ClawHub.
Skill page: https://clawhub.ai/chentx1243/maple-video2txt
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required binaries: python3
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install maple-video2txt

ClawHub CLI

Package manager switcher

npx clawhub@latest install maple-video2txt

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

medium confidence

✓

Purpose & Capability

Name/description, required binary (python3), declared dependencies (faster-whisper, av, opencc) and the included Python script all align with a local transcription/subtitle generation tool. The script uses ffprobe/ffmpeg and faster-whisper as expected.

ℹ

Instruction Scope

SKILL.md instructs the agent to run the provided Python script and to install dependencies; the instructions stay within the transcription task. Important operational behaviors are called out (model download on first run, use background execution). Note: the script will perform network downloads for Whisper models and will call ffprobe/ffmpeg via subprocess; these are expected but are external network/system interactions the user should be aware of. Also the SKILL.md pre-scan flagged unicode-control-chars (possible prompt-injection attempt) — the visible content looks normal, but a manual check of raw file bytes for hidden control characters is recommended.

✓

Install Mechanism

No custom install spec; standard pip requirements.txt is provided. This is low-risk compared with arbitrary remote archive downloads. The only external runtime download is the Whisper model files (expected for this functionality).

✓

Credentials

The skill requests no environment variables or credentials. It needs access to local files (input media) and will write SRT/TXT output and model files to disk (models directory). These requirements are proportional to the stated goal.

✓

Persistence & Privilege

Skill is not always-on and does not request special platform privileges. It does not declare or appear to modify other skills or global agent settings.

Scan Findings in Context

[unicode-control-chars] unexpected: A prompt-injection detection found unicode control characters in SKILL.md. This is not required for transcription and is unexpected; it may be an artifact or a concealed formatting attempt. Recommend inspecting the raw SKILL.md for hidden control characters (e.g., U+202E, U+200B, U+202C) before trusting/automating the skill.

Assessment

This skill appears to do what it says: transcribe local media into .srt and .txt using faster-whisper and ffmpeg. Before installing or running it: 1) Run pip installs in an isolated venv. 2) Ensure ffmpeg/ffprobe are installed and on PATH. 3) Be aware the first run will download Whisper model files (network traffic and significant disk use); verify you are comfortable with that. 4) Inspect the SKILL.md and script for any hidden/unexpected characters or modifications (the static scan flagged unicode control characters). 5) If you plan to run it automatically, restrict it to media files you trust — the script will read local files and write model files and outputs to disk. If anything in the SKILL.md raw text looks suspicious, do not grant it automated/autonomous execution until you confirm the content is clean.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

video Clawdis

Binspython3

latestvk97evey77q8v92r2qa0rvkypmx83afww

137downloads

0stars

2versions

Updated 1mo ago

v1.0.1

MIT-0

video2txt 技能

描述

将本地视频或音频文件转写为 SRT 字幕文件和 TXT 纯文本文件。

功能

提取视频/音频中的语音内容
生成带时间戳的 SRT 字幕文件
生成纯文本 TXT 文件
支持多种视频和音频格式
默认使用中文识别，自动转换为简体中文

使用场景：

需要读取视频内容或理解视频时

使用方法

基本命令

python video_to_text.py --input <视频/音频文件路径>

注意事项

后台执行：调用此脚本时，务必使用 background: true 参数，避免弹出控制台窗口
脚本运行过程中会输出详细的进度日志（每 10% 报告一次），方便追踪执行状态

示例

# 基本用法
python video_to_text.py --input "D:\videos\meeting.mp4"

# 指定输出目录
python video_to_text.py --input "D:\videos\meeting.mp4" --output-dir "D:\captions"

# 指定输出路径
python video_to_text.py --input "D:\videos\meeting.mp4" --output-path "D:\captions\meeting_result"

# 指定语言和模型
python video_to_text.py --input "D:\videos\meeting.mp4" --language zh --model-size small

参数说明

参数	说明	默认值
`--input`	输入文件路径（必需）	-
`--output-dir`	输出目录	输入文件目录
`--output-path`	输出文件基础路径	-
`--model-dir`	模型下载目录	当前目录/models
`--model-size`	Whisper 模型大小	base
`--language`	识别语言 (auto/zh/en)	zh
`--device`	推理设备 (cpu/cuda)	cpu
`--compute-type`	计算类型	int8
`--beam-size`	解码束大小 (1-5)	2
`--no-vad-filter`	禁用 VAD 过滤	false

依赖

faster-whisper >= 1.1.0
av >= 12.0.0
opencc-python-reimplemented >= 0.1.7
ffprobe/ffmpeg
Whisper 模型文件（首次运行自动下载，需要发起网络请求，占用磁盘空间）

安装

确保 Python 3.11 或 3.12 环境
安装依赖：python -m pip install -r requirements.txt
首次运行会自动下载 Whisper 模型到 models 目录

输出文件

<输入文件名>.srt - 带时间戳的字幕文件
<输入文件名>.txt - 纯文本文件

注意事项

首次运行需要下载 Whisper 模型，可能需要几分钟时间
建议使用 Python 3.11 或 3.12，避免与 faster-whisper 的兼容性问题
中文识别会自动将繁体字转换为简体字
为了减少用户等待焦虑，每间隔10秒左右报告一次处理进度
beam-size 默认为 2，如需调整可手动指定 --beam-size 参数

Comments

Loading comments...