Install
openclaw skills install @xiaozishan/bili-collection-pipelineopenclaw skills install @xiaozishan/bili-collection-pipeline一键批量转录 B站合集 或 YouTube 播放列表,输出按语义分段的 Markdown 文件。
Supports Bilibili collections AND YouTube playlists!
B站视频/YouTube链接 → fetch_collection.py → 视频列表JSON
→ transcribe_collection.py → 下载→提音频→Whisper转录
→ semantic_segment.py → 语义分段
→ .md知识库
# System tools
# you-get: https://github.com/soimort/you-get (for Bilibili)
# ffmpeg: https://ffmpeg.org/
# yt-dlp: https://github.com/yt-dlp/yt-dlp (for YouTube + Bilibili fallback)
# pip install yt-dlp
# Python
pip install faster-whisper requests
# Bilibili (by any video URL or BVID)
python3 scripts/fetch_collection.py "https://www.bilibili.com/video/BV1GeDSYhEVZ" -o collection.json
# YouTube (by playlist or video URL)
python3 scripts/fetch_collection.py "https://youtube.com/playlist?list=PLxxxxx" -o collection.json
python3 scripts/fetch_collection.py "https://youtu.be/xxxxx" -o collection.json
脚本自动检测是 B站还是 YouTube 链接。
python3 scripts/transcribe_collection.py collection.json \
--output ./output --model small --device cuda --progress progress.json
标题.md 命名python3 scripts/semantic_segment.py ./output/*.md
基于 Jaccard 词汇连贯性 + 结构信号进行智能分段。
用 DeepSeek / OpenAI 兼容 API 做错别字修正或散文化改写。
| 文件 | 作用 |
|---|---|
scripts/fetch_collection.py | 解析B站合集或YouTube播放列表,输出JSON |
scripts/transcribe_collection.py | 批量下载→转录→输出.md |
scripts/semantic_segment.py | 语义分段v2算法 |
pip install yt-dlp