Bilibili Notion Pipeline Skill
v0.1.2Skill-first Bilibili to Notion pipeline. Download a Bilibili/b23 video, transcribe audio, upload the mp4, create or update a Notion transcript page, write tr...
MIT-0
Security Scan
OpenClaw
Suspicious
high confidencePurpose & Capability
The skill's name and description (download Bilibili video → transcribe → upload → create/update Notion page) match the code in scripts/pipeline.py and scripts/notion_markdown.py. However the registry metadata declares no required environment variables or binaries while the code requires multiple environment variables (NOTION_API_KEY, NOTION_DATABASE_ID, UPLOAD_URL, UPLOAD_TOKEN, BILI_COOKIES_FILE, PIPELINE_DATA_DIR overrides) and external tools (ffmpeg, ffprobe, whisper CLI or faster-whisper, yt_dlp, python packages like requests and yt_dlp). This is an incoherence: a user would legitimately need those env vars and binaries to run the described pipeline.
Instruction Scope
SKILL.md describes the expected workflow and shows example CLI invocations for pipeline.py; it does not instruct the agent to read unrelated system files. The code, however, will read environment variables and create/read/write files under a data directory (default derived from repository root). The instructions call out an upload backend (stor.pull.eu.org) and warn not to commit tokens, but they do not clearly require or document NOTION_API_KEY/NOTION_DATABASE_ID/UPLOAD_URL/UPLOAD_TOKEN or presence of ffmpeg/ffprobe/whisper/yt_dlp. Because uploads go to a configurable UPLOAD_URL, a misconfigured or malicious upload endpoint could exfiltrate video data — this is a normal feature but a risk if the endpoint is untrusted.
Install Mechanism
There is no install specification (instruction-only skill), so nothing is auto-downloaded by the registry. However the code depends on non-declared Python packages (yt_dlp, requests, faster_whisper or the whisper CLI, opencc optionally) and system binaries (ffmpeg, ffprobe). The lack of an install spec means users must manually install these; the omission in metadata/documentation is a usability and security concern but not an immediate remote-code install risk from this bundle itself.
Credentials
The registry lists no required env vars or primary credential, but pipeline.py reads many environment variables at runtime: NOTION_API_KEY, NOTION_DATABASE_ID, UPLOAD_URL, UPLOAD_TOKEN, BILI_COOKIES_FILE, plus PIPELINE_DATA_DIR/BILI_DOWNLOAD_DIR/BILI_TEMP_DIR and ASR/WHISPER-related settings. NOTION_API_KEY and BILI_COOKIES_FILE can contain sensitive auth material; requesting them is proportionate to the stated Notion/upload functionality, but the metadata's omission of these requirements is misleading and increases risk of accidental exposure or misconfiguration. In particular: - NOTION_API_KEY and NOTION_DATABASE_ID are required for creating new Notion pages; - BILI_COOKIES_FILE may contain session cookies; - UPLOAD_URL/UPLOAD_TOKEN control where large media get sent and could be used to exfiltrate data if set to an attacker-controlled endpoint.
Persistence & Privilege
The skill does not request always: true, does not modify other skills, and contains no installer that persistently injects itself into other agent configurations. It writes local files under a data directory (default inside the repository), which is expected for a download/transcribe pipeline. Autonomous invocation is allowed by default (disable-model-invocation is false) but that is the platform default and not by itself a red flag.
What to consider before installing
This skill appears to implement exactly the Bilibili→transcribe→Notion workflow, but the registry metadata omits important runtime requirements. Before installing or running it: 1) Do not put real tokens/cookies in the repo. Provide NOTION_API_KEY and NOTION_DATABASE_ID only via secure environment variables, and grant the Notion token the minimum permissions needed. 2) Verify UPLOAD_URL and UPLOAD_TOKEN — the upload target will receive copies of your mp4 files; ensure it's a trusted storage endpoint (the README mentions stor.pull.eu.org as an example). Treat UPLOAD_URL as potentially sensitive and do not point it to unknown hosts. 3) Be aware that BILI_COOKIES_FILE may contain session cookies — only use a cookie file you trust. 4) Manually install and audit dependencies and binaries required by the script (ffmpeg, ffprobe, yt-dlp, whisper CLI or faster-whisper python package, requests, opencc optional). The skill doesn't declare these, so failing to install them will break the pipeline. 5) Run the code in an isolated environment (container or dedicated VM) the first time, review the full pipeline.py (especially the parts that call requests.post and requests.patch) and test with non-sensitive data. 6) If you need registry-level assurance, ask the author to update the skill metadata to declare required env vars and required binaries and to provide an install spec or requirements.txt so dependency installation is explicit.Like a lobster shell, security has layers — review code before you run it.
bilibililatestnotionopenclawwhisperworkflow
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
SKILL.md
Skill-First Bilibili → Notion Pipeline
这个 skill 现在的定位是:
Skill-first,agent-enhanced。
也就是说:
-
Skill 是主体
- 下载视频
- 抽音频
- 转写文本
- 上传视频
- 创建/更新 Notion 页面
- 写入正文 blocks
- 清理临时文件
-
Agent 是增强层
- 页面是新建还是更新
- 是否替换旧正文
- 文后总结怎么写
- 需要给用户回报哪些进度
- 出错时如何切换兜底路径
什么时候用
当用户提出类似请求时触发:
- “把这个 B 站视频整理进 Notion”
- “下载、转写、上传并写 Notion”
- “给这篇整理字幕页补结构梳理和核心观点”
- “把视频内容做成正文 + 文后总结”
- “把 B 站内容入库到 Notion,并保留下载链接”
为什么它首先是 Skill
因为这套流程的大部分工作,都是:
- 可重复
- 低自由度
- 易脚本化
- 需要稳定执行
所以优先应该交给 scripts/,而不是每次让 agent 临场重写。
标准流程
推荐:一键 run
python skill/bilibili-notion-pipeline/scripts/pipeline.py run \
--url "<b23或BV链接>" \
--cleanup-mode temp
如果已经有人写好了 Markdown 总结:
python skill/bilibili-notion-pipeline/scripts/pipeline.py run \
--url "<b23或BV链接>" \
--markdown-file /path/to/summary.md \
--require-summary \
--cleanup-mode temp
run 会按顺序执行:
- 解析视频
- 下载视频
- 抽取音频
- 转写正文
- 上传视频
- 创建 / 更新 Notion 页面
- 写入正文 blocks
- 可选追加 Markdown 总结
- 回读校验页面结构
- 清理本地中间文件
分步模式(需要人工插入总结时)
1)执行 prepare
python skill/bilibili-notion-pipeline/scripts/pipeline.py prepare --url "<b23或BV链接>"
如果用户明确给了已有 Notion 页面:
python skill/bilibili-notion-pipeline/scripts/pipeline.py prepare \
--url "<链接>" \
--page-id "<notion_page_id>" \
--replace-children
prepare 会输出 JSON,记下:
page_idnotion_urltranscript_pathmetadata_pathdownload_url
2)阅读转写正文
用 read 读取 transcript_path,判断:
- 主题是否跑偏
- 识别质量是否可接受
- 是否需要人工干预
- 文后总结应该如何组织
3)补文后总结
先按固定结构写 Markdown:
## 结构梳理## 核心观点## 关键概念
可参考:
references/summary-template.mdreferences/workflow.md
4)把总结追加到 Notion
python skill/bilibili-notion-pipeline/scripts/pipeline.py append-summary \
--page-id "<page_id>" \
--markdown-file "/path/to/summary.md"
5)回读校验
python skill/bilibili-notion-pipeline/scripts/pipeline.py verify \
--metadata "<metadata_path>" \
--require-summary
6)按需清理
默认建议删除:
- wav
- transcript txt
本地 mp4 是否删除,由用户决定:
python skill/bilibili-notion-pipeline/scripts/pipeline.py cleanup \
--metadata "<metadata_path>" \
--mode temp
如果用户明确不要保留视频:
python skill/bilibili-notion-pipeline/scripts/pipeline.py cleanup \
--metadata "<metadata_path>" \
--mode all
进度回报要求
长任务不要静默卡住。
至少在这些节点主动回报:
- 已解析视频 / 已开始下载
- 已开始转写
- 已上传并拿到
download_url - 已写入 Notion 正文
- 已补文后总结
- 已完成回读校验
- 已清理 / 保留了哪些本地文件
上传后端约定(简版)
这个 skill 把上传后端视为可替换组件,但当前自用实践里常见的是:
https://stor.pull.eu.org/
执行时只需要关心它是否满足下面几件事:
- 能上传 mp4 并返回公开
download_url - 最好支持较大的视频文件
- 最好支持分片上传,降低长视频失败率
- 如果带 WebDAV 或等价文件管理能力,会更利于整理、迁移和备份
当前这套能力受益于下列项目提供的思路与实现基础:
https://github.com/MarSeventh/CloudFlare-ImgBed
如果后端底层依赖 Telegram 群组 / 频道这类平台型存储,要默认认为它是:
- 高性价比 的工程方案
- 但不是零风险永久存储
因此执行这条流程时,仍建议:
- 本地保留 metadata / transcript
- 是否删除本地 mp4,必须按用户明确偏好处理
- 不要把远端外链当成唯一副本
注意事项
- 不要把真实 token、cookies、profile、日志提交到仓库
- 官方字幕不可靠,默认准备 ASR 兜底
- 如果转写质量明显跑偏,不要硬写总结,先告知用户
- 更新已有页面时,只有在用户明确要求替换旧正文时才用
--replace-children - 对外介绍时,优先把它说成 Skill 仓库;agent 能力属于增强层,而不是唯一身份
Files
5 totalSelect a file
Select a file to preview.
Comments
Loading comments…
