Tieba Spider

v1.0.0

贴吧帖子爬虫 - 从百度贴吧抓取帖子内容并导出为 Markdown（支持图片下载、楼中楼解析）。Tieba thread crawler - crawl Tieba threads to Markdown with images and sub-posts.

⭐ 0· 67·0 current·0 all-time

by@fuxiaoji

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for fuxiaoji/tieba-spider.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Tieba Spider" (fuxiaoji/tieba-spider) from ClawHub.
Skill page: https://clawhub.ai/fuxiaoji/tieba-spider
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install tieba-spider

ClawHub CLI

Package manager switcher

npx clawhub@latest install tieba-spider

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

Name/description (Tieba thread crawler → export Markdown, images, sub-posts) matches the included code and SKILL.md. The only runtime dependency is python3, which is appropriate.

✓

Instruction Scope

SKILL.md instructs running the CLI with a thread URL/ID and standard options. The code only requests data from Baidu Tieba API endpoints and image URLs found in posts; it writes output to a user-specified directory. It does not read unrelated files, environment variables, or credentials.

✓

Install Mechanism

No install spec (instruction-only skill) and the provided code is a local Python script. Nothing is downloaded or installed by the platform—lowest-risk install model.

✓

Credentials

No environment variables, credentials, or config paths are requested. This is proportionate for a public-thread crawler that uses public mobile API endpoints.

✓

Persistence & Privilege

Skill is not always-enabled and does not request persistent platform privileges or modify other skills. It writes files only to the user-specified output directory (normal behavior for a scraper).

Assessment

This skill is coherent and does what its README promises, but take these practical precautions before running: (1) downloaded images are fetched from URLs embedded in posts — those can point to arbitrary external hosts, so use --no-images if you want to avoid making extra outbound requests; (2) the script posts to API_BASE over HTTP (not HTTPS) which may be susceptible to network interception — run on a trusted network; (3) files are written to the output directory you choose, so pick a safe location and watch disk usage; (4) respect site terms of service and rate limits (the script has a default 0.5s delay and a --delay option); (5) run in an isolated environment if you have concerns (container or VM).

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🕷️ Clawdis

Any binpython3

latestvk974trhjbwd5at9n3r348c9d8985fs0d

67downloads

0stars

1versions

Updated 3d ago

v1.0.0

MIT-0

Tieba Spider 🕷️

贴吧帖子爬虫 - 把百度贴吧帖子连图片带回复全扒下来

Crawl Tieba threads with full content, images, and sub-posts to Markdown.

功能 | Features

爬取帖子所有楼层内容
下载帖子中的图片到本地
解析楼中楼（子回复）内容
导出为整洁 Markdown 格式
支持指定输出目录和请求延迟

用法 | Usage

# 帖子链接或 ID
python3 tieba_spider.py "https://tieba.baidu.com/p/7487460366"
python3 tieba_spider.py "7487460366"

# 指定输出目录
python3 tieba_spider.py "7487460366" --output ~/downloads

# 不下载图片
python3 tieba_spider.py "7487460366" --no-images

参数 | Options

参数	说明	Description
`帖子`	帖子链接或纯数字 ID	Thread URL or ID
`--output/-o`	输出目录 (默认: 当前目录)	Output directory
`--no-images`	不下载图片	Skip image download
`--delay/-d`	请求间隔秒数 (默认: 0.5)	Request delay

输出结构 | Output

{帖子ID}_{标题}/
├── {标题}.md        # 帖子内容 Markdown
└── images/          # 下载的图片

示例 | Example

python3 tieba_spider.py 7487460366 --output ./output

输出文件会保存在 ./output/7487460366_xxx/ 目录。

技术说明

使用百度贴吧移动端 API，无需登录即可抓取。内置 0.5 秒延迟防止请求过快。

Comments

Loading comments...