Tieba Spider

v1.0.0

贴吧帖子爬虫 - 从百度贴吧抓取帖子内容并导出为 Markdown(支持图片下载、楼中楼解析)。Tieba thread crawler - crawl Tieba threads to Markdown with images and sub-posts.

0· 67·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for fuxiaoji/tieba-spider.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Tieba Spider" (fuxiaoji/tieba-spider) from ClawHub.
Skill page: https://clawhub.ai/fuxiaoji/tieba-spider
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install tieba-spider

ClawHub CLI

Package manager switcher

npx clawhub@latest install tieba-spider
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (Tieba thread crawler → export Markdown, images, sub-posts) matches the included code and SKILL.md. The only runtime dependency is python3, which is appropriate.
Instruction Scope
SKILL.md instructs running the CLI with a thread URL/ID and standard options. The code only requests data from Baidu Tieba API endpoints and image URLs found in posts; it writes output to a user-specified directory. It does not read unrelated files, environment variables, or credentials.
Install Mechanism
No install spec (instruction-only skill) and the provided code is a local Python script. Nothing is downloaded or installed by the platform—lowest-risk install model.
Credentials
No environment variables, credentials, or config paths are requested. This is proportionate for a public-thread crawler that uses public mobile API endpoints.
Persistence & Privilege
Skill is not always-enabled and does not request persistent platform privileges or modify other skills. It writes files only to the user-specified output directory (normal behavior for a scraper).
Assessment
This skill is coherent and does what its README promises, but take these practical precautions before running: (1) downloaded images are fetched from URLs embedded in posts — those can point to arbitrary external hosts, so use --no-images if you want to avoid making extra outbound requests; (2) the script posts to API_BASE over HTTP (not HTTPS) which may be susceptible to network interception — run on a trusted network; (3) files are written to the output directory you choose, so pick a safe location and watch disk usage; (4) respect site terms of service and rate limits (the script has a default 0.5s delay and a --delay option); (5) run in an isolated environment if you have concerns (container or VM).

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🕷️ Clawdis
Any binpython3
latestvk974trhjbwd5at9n3r348c9d8985fs0d
67downloads
0stars
1versions
Updated 3d ago
v1.0.0
MIT-0

Tieba Spider 🕷️

贴吧帖子爬虫 - 把百度贴吧帖子连图片带回复全扒下来

Crawl Tieba threads with full content, images, and sub-posts to Markdown.

功能 | Features

  • 爬取帖子所有楼层内容
  • 下载帖子中的图片到本地
  • 解析楼中楼(子回复)内容
  • 导出为整洁 Markdown 格式
  • 支持指定输出目录和请求延迟

用法 | Usage

# 帖子链接或 ID
python3 tieba_spider.py "https://tieba.baidu.com/p/7487460366"
python3 tieba_spider.py "7487460366"

# 指定输出目录
python3 tieba_spider.py "7487460366" --output ~/downloads

# 不下载图片
python3 tieba_spider.py "7487460366" --no-images

参数 | Options

参数说明Description
帖子帖子链接或纯数字 IDThread URL or ID
--output/-o输出目录 (默认: 当前目录)Output directory
--no-images不下载图片Skip image download
--delay/-d请求间隔秒数 (默认: 0.5)Request delay

输出结构 | Output

{帖子ID}_{标题}/
├── {标题}.md        # 帖子内容 Markdown
└── images/          # 下载的图片

示例 | Example

python3 tieba_spider.py 7487460366 --output ./output

输出文件会保存在 ./output/7487460366_xxx/ 目录。

技术说明

使用百度贴吧移动端 API,无需登录即可抓取。内置 0.5 秒延迟防止请求过快。

Comments

Loading comments...