Install
openclaw skills install subtitle-extractorSubtitle extractor for Bilibili, YouTube, Xiaohongshu, Douyin, and local files. Extracts native subtitles or Whisper transcription in original format. Agent...
openclaw skills install subtitle-extractorExtracts subtitles from video platforms in their native format. Supports Bilibili, YouTube, Xiaohongshu, Douyin, and local video files.
Scope of this skill: subtitle extraction only. Summarization, analysis, Q&A — all handled by the agent based on the user's actual request.
outputs/ and processes per user requestAgent must verify dependencies before calling the script. If any are missing, inform the user with the relevant install command.
# Check
yt-dlp --version
# Install
pip install yt-dlp # all platforms (recommended)
brew install yt-dlp # macOS Homebrew
winget install yt-dlp.yt-dlp # Windows WinGet
scoop install yt-dlp # Windows Scoop
conda install -c conda-forge yt-dlp # Conda environments
# Upgrade existing install
pip install -U yt-dlp
Only needed for Xiaohongshu, Douyin, local files, or Path B (Whisper transcription).
# Check
ffmpeg -version
# Install
brew install ffmpeg # macOS Homebrew
winget install Gyan.FFmpeg # Windows WinGet
choco install ffmpeg # Windows Chocolatey
scoop install ffmpeg # Windows Scoop
apt install ffmpeg # Ubuntu / Debian
dnf install ffmpeg # Fedora / RHEL (may need RPM Fusion)
pacman -S ffmpeg # Arch Linux
snap install ffmpeg # Ubuntu Snap
Windows users: restart the terminal after installation for PATH to take effect. If
wingetis unavailable, download from ffmpeg.org and add thebin/directory to system PATH.
Only needed for Xiaohongshu, Douyin, local files, or Path B (Whisper transcription).
# Check
python3 -c "from faster_whisper import WhisperModel; print('ok')"
# Install
pip install faster-whisper
# Configure model size (default: base)
export VIDEO_SUMMARY_WHISPER_MODEL=base # tiny | base | small | medium | large
Note: Model files are downloaded automatically on first transcription run (~150MB for base). This may take a minute depending on network speed.
China network note: If auto-download fails (HuggingFace blocked), see Whisper model download failed in Troubleshooting.
Transcription time estimate (CPU, faster-whisper):
| Video Duration | tiny | base | small | medium |
|---|---|---|---|---|
| 5 min | ~10s | ~20s | ~40s | ~80s |
| 15 min | ~30s | ~60s | ~2m | ~4m |
| 30 min | ~60s | ~2m | ~5m | ~10m |
GPU accelerates transcription 5–15×. First run downloads the model (~150MB for base).
Bilibili requires a cookie file for all requests. The script auto-discovers cookie files in the skill directory only (same folder as subtitle-extractor.py and SKILL.md):
Any .txt file whose name contains bilibili will be picked up automatically — including the browser extension's default export format www.bilibili.com_netscape_<timestamp>.txt.
Place your cookie file in the skill directory. The agent does not need to locate or pass it manually (see Step 1b).
video-summary "https://www.xiaohongshu.com/explore/xxxxx" --cookies cookies.txt
# or
export VIDEO_SUMMARY_COOKIES=/path/to/cookies.txt
How to export cookies:
.txt file(Netscape format)Steps 1–4 in this skill MUST be completed in full before addressing any user request. The subtitle file MUST be saved to disk (Step 4) before the agent proceeds to summarization, translation, analysis, or any other task the user has asked for.
Treat Steps 1–4 as mandatory prerequisites, not optional helpers. Do not skip any step even if the user's final output format (e.g. a markdown file) appears to make it unnecessary.
yt-dlp --version
If the user requests Whisper transcription (keywords: "whisper转录" / "用whisper" / "transcribe" / "转录" / "语音转文字"), or the platform is Xiaohongshu, Douyin, or a local file, also check:
ffmpeg -version
python3 -c "from faster_whisper import WhisperModel; print('ok')"
If anything is missing, stop and tell the user which dependency to install (see Dependencies section).
The script auto-discovers any .txt file containing "bilibili" in the skill directory. Do not search for or pass the cookie file yourself.
Only act if the script exits with:
未找到 Bilibili Cookie 文件 → tell the user to place a cookie file in the skill directoryBilibili 412 错误:Cookie 已过期 → tell the user to re-exportTo export: install "Cookie Editor (https://cookieeditor.org/)", log in to Bilibili, export Netscape format → place in skill directory → retry.
Determine which path applies, then execute it completely before moving to Step 3.
Path A — Native subtitles
Use when: Bilibili or YouTube URL, and the user has not mentioned any transcription keyword.
Tell the user: "正在提取字幕..."
python subtitle-extractor.py "<url>" # auto-detect language
python subtitle-extractor.py "<url>" --lang zh-CN # force language
Parse the JSON from stdout. You now have all four fields needed for Step 3:
| Field | Value |
|---|---|
title | from this JSON |
author | from this JSON |
platform | from this JSON |
subtitle_file | from this JSON |
If the script exits non-zero: read stderr, report the error to the user, stop.
Path B — Whisper transcription
Use when: user mentions any transcription keyword, OR platform is Xiaohongshu or Douyin.
Transcription keyword takes priority over phrasing like "提取字幕" or "字幕原文" — those describe the desired output, not the method.
Call 1 — Download audio (skip for local files, go to Call 2 directly)
Tell the user: "正在下载音频,请稍候..."
python subtitle-extractor.py "<url>" --step download-audio
Parse the JSON from stdout and record these values:
| Field | Value |
|---|---|
title | from this JSON |
author | from this JSON |
platform | from this JSON |
audio_file | from this JSON — input for Call 2 |
If the script exits non-zero: read stderr, report the error to the user, stop.
Tell the user: "音频下载完成,开始 Whisper 转录(模型: base),请稍候..."
Call 2 — Transcribe
For URL input, use the audio_file recorded from Call 1:
python subtitle-extractor.py "<audio_file>" --step transcribe
For local file input (set title = filename, author = "local"):
python subtitle-extractor.py "<local_file_path>" --step transcribe
Parse the JSON from stdout and record:
| Field | Value |
|---|---|
subtitle_file | from this JSON |
Tell the user: "转录完成!"
If the script exits non-zero:
Whisper 模型下载失败: show the full error message verbatim — it contains the exact download directory and manual stepsFailure rule: Do not run yt-dlp, ffmpeg, or Whisper commands manually. Do not retry with different flags unless the error message explicitly says to.
Verify you have collected all four values from the script outputs in Step 2:
| Field | Path A source | Path B source |
|---|---|---|
title | script JSON | Call 1 JSON (or filename for local) |
author | script JSON | Call 1 JSON (or "local") |
subtitle_file | script JSON | Call 2 JSON |
Note: non-ASCII characters in JSON output appear as \uXXXX escapes — standard JSON parsing produces the correct decoded strings.
Before answering the user, save the subtitle file to the session outputs directory.
Naming rule: {title前8字}_{author}.{原格式扩展名}
Steps:
title, keep the first 8 characters (Chinese and English each count as 1)/ \ : * ? " < > | and spaces with _authorsubtitle_file path (.srt or .vtt){outputs_dir}/{safe_title8}_{safe_author}.{ext}Read the subtitle file content and respond to the user's original request — summarize, analyze, translate, answer questions, etc. The subtitle content is in SRT or VTT format with timestamps; LLMs handle both directly.
| Platform | Method | Notes |
|---|---|---|
| YouTube | yt-dlp native CC + auto-generated | Best support, usually no cookies needed |
| Bilibili | yt-dlp native CC | Auto-discovers cookies; zh-CN → ai-zh fallback; 412 error handling |
| Xiaohongshu | Whisper transcription | No native subtitles; requires ffmpeg + whisper |
| Douyin | Whisper transcription | No native subtitles; requires ffmpeg + whisper |
| Local files | Whisper transcription | mp4, mkv, webm, mp3, etc. |
YouTube: youtube.com/watch?v=... · youtu.be/...
Bilibili: bilibili.com/video/BV... · bilibili.com/video/av... · b23.tv/... (short link)
Xiaohongshu: xiaohongshu.com/explore/... · xhslink.com/... (short link)
Douyin: douyin.com/video/... · v.douyin.com/... (short link)
Usage:
python subtitle-extractor.py <url|file> [options]
Steps (--step):
download-audio Download audio from URL → {"audio_file", "title", "author", "platform"}
transcribe Transcribe local audio/video file → {"subtitle_file"}
(none) Extract native subtitles (default path) → {"title", "author", "platform", "subtitle_file"}
Options:
--step <name> Pipeline step to run: download-audio | transcribe
--lang <code> Subtitle language code, default path only (default: auto)
--cookies <file> Cookie file for restricted content
--help Show help
Environment Variables:
VIDEO_SUMMARY_COOKIES Path to cookies file
VIDEO_SUMMARY_WHISPER_MODEL Whisper model size (default: base)
pip install yt-dlp
--step download-audio then --step transcribe) to force Whisper--lang to specify a different language codeCookie expired. Re-export:
The script automatically falls back to ai-zh. If both fail, it lists all available subtitle codes. Use --lang <code> to specify one.
pip install faster-whisper
The script tries hf-mirror.com then huggingface.co. If both fail (common in China), the script will print exact steps. Show the error message to the user verbatim — it contains the exact directory path and download URL.
Manual download (browser accessible in China):
https://modelscope.cn/models/pkufool/faster-whisper-base/filesconfig.json model.bin tokenizer.json vocabulary.json preprocessor_config.jsonFor other model sizes (tiny/small/medium/large), change faster-whisper-base to faster-whisper-{size} in the ModelScope URL.
See ffmpeg install commands in the Dependencies section above.
Use a smaller model:
export VIDEO_SUMMARY_WHISPER_MODEL=tiny
Extract subtitles. Let the agent think.