PKU Info Spider

v1.0.0

WeChat Official Account article crawler (微信公众号爬虫) CLI tool built in Rust. Use this skill when working on the info-spider crate, debugging spider commands, ad...

⭐ 0· 107·0 current·0 all-time

by@wjsoj

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

medium confidence

✓

Purpose & Capability

Name, description and SKILL.md consistently describe a Rust CLI for crawling WeChat Official Account articles (login, search, scrape, output formats). Nothing in the metadata asks for unrelated services or credentials.

ℹ

Instruction Scope

SKILL.md documents repo layout, commands, config path (~/.config/info-spider/) and session tokens (token, fingerprint, bizuin). The file is descriptive and does not itself instruct the agent to read or exfiltrate system files, but it explicitly references local session storage and the QR login flow which an agent might be asked to access when debugging—so exercise caution if allowing the agent to interact with local files or accounts.

✓

Install Mechanism

Instruction-only skill with no install spec, no downloads, and no declared required binaries—lowest install risk.

ℹ

Credentials

The skill declares no required environment variables or credentials (proportional), but its functionality inherently requires session tokens/credentials stored in config paths. If you plan to use the skill to operate the CLI, be aware it will need access to WeChat session data; the skill does not declare or justify any other unrelated secrets.

✓

Persistence & Privilege

always is false and there is no install-time persistence or modification of other skills. Default autonomous invocation is allowed by platform policy but not a standalone concern here.

Assessment

This SKILL.md reads like accurate documentation for a local Rust CLI that scrapes WeChat MP. Before installing or giving the agent permission to act: (1) confirm the skill's source and review the actual code (there is no homepage or source link); (2) never paste WeChat account credentials into the agent—if testing login flows, use throwaway accounts; (3) be mindful that scraping MP may violate terms of service or local law and the project explicitly mentions evasion (configurable delays) — consider the legal/ethical implications; (4) restrict the agent's access to your filesystem (especially ~/.config/info-spider/) unless you explicitly want it to read session tokens; and (5) if you need help debugging, prefer having the agent suggest commands/snippets rather than granting it direct execution or file access. If you want a higher-confidence assessment, provide the repository or actual source files for code review.

Like a lobster shell, security has layers — review code before you run it.

crawlervk9711a9vykrz9nz1nhp7xa581x84hyq3latestvk9711a9vykrz9nz1nhp7xa581x84hyq3pkuvk9711a9vykrz9nz1nhp7xa581x84hyq3rustvk9711a9vykrz9nz1nhp7xa581x84hyq3wechatvk9711a9vykrz9nz1nhp7xa581x84hyq3

107downloads

0stars

1versions

Updated 1w ago

v1.0.0

MIT-0

Info-Spider - 微信公众号爬虫 CLI

A CLI crawler for WeChat Official Account (公众号) articles via the MP backend.

Architecture

Crate location: crates/info-spider/
Auth flow: WeChat QR code login (completely separate from IAAA, does NOT use info-common)
API: mp.weixin.qq.com backend API
Config: ~/.config/info-spider/ (separate from info-common Store)
Flow docs: docs/wechat-mp-flow.md

Key Source Files

src/main.rs — Entry point
src/cli.rs — Clap CLI definition
src/commands.rs — Command implementations
src/api.rs — WeChat MP API client
src/session.rs — Own session persistence (token, fingerprint, bizuin)
src/client.rs — reqwest client builders

CLI Commands

Command	Function
`login`	WeChat QR code scan login to mp.weixin.qq.com
`logout` / `status`	Session management
`search <QUERY>`	Find Official Accounts by name/ID (returns fakeid list)
`articles`	Fetch articles from an OA (`--name` or `--fakeid`)
`scrape <URL>`	Convert single article URL to Markdown

Articles Command Options

--begin — Start offset for pagination
--count — Articles per page
--limit — Maximum total articles to fetch
--delay-ms — Random delay between requests (anti-crawler)
--format {table|json|jsonl} — Output format

Development Notes

Standalone auth: Uses its own WeChat QR login, NOT the IAAA flow from info-common
Own session.rs: Stores token, fingerprint, bizuin (different from info-common session format)
Mimics real user behavior with configurable delays to bypass risk controls
Article scraping extracts content to clean Markdown
Multiple output formats: table (default), JSON, JSONL
All user-facing strings in Chinese
Error handling: anyhow::Result with .context("中文描述")

Comments

Loading comments...