PKU Info Spider
v1.0.0WeChat Official Account article crawler (微信公众号爬虫) CLI tool built in Rust. Use this skill when working on the info-spider crate, debugging spider commands, ad...
Security Scan
OpenClaw
Benign
medium confidencePurpose & Capability
Name, description and SKILL.md consistently describe a Rust CLI for crawling WeChat Official Account articles (login, search, scrape, output formats). Nothing in the metadata asks for unrelated services or credentials.
Instruction Scope
SKILL.md documents repo layout, commands, config path (~/.config/info-spider/) and session tokens (token, fingerprint, bizuin). The file is descriptive and does not itself instruct the agent to read or exfiltrate system files, but it explicitly references local session storage and the QR login flow which an agent might be asked to access when debugging—so exercise caution if allowing the agent to interact with local files or accounts.
Install Mechanism
Instruction-only skill with no install spec, no downloads, and no declared required binaries—lowest install risk.
Credentials
The skill declares no required environment variables or credentials (proportional), but its functionality inherently requires session tokens/credentials stored in config paths. If you plan to use the skill to operate the CLI, be aware it will need access to WeChat session data; the skill does not declare or justify any other unrelated secrets.
Persistence & Privilege
always is false and there is no install-time persistence or modification of other skills. Default autonomous invocation is allowed by platform policy but not a standalone concern here.
Assessment
This SKILL.md reads like accurate documentation for a local Rust CLI that scrapes WeChat MP. Before installing or giving the agent permission to act: (1) confirm the skill's source and review the actual code (there is no homepage or source link); (2) never paste WeChat account credentials into the agent—if testing login flows, use throwaway accounts; (3) be mindful that scraping MP may violate terms of service or local law and the project explicitly mentions evasion (configurable delays) — consider the legal/ethical implications; (4) restrict the agent's access to your filesystem (especially ~/.config/info-spider/) unless you explicitly want it to read session tokens; and (5) if you need help debugging, prefer having the agent suggest commands/snippets rather than granting it direct execution or file access. If you want a higher-confidence assessment, provide the repository or actual source files for code review.Like a lobster shell, security has layers — review code before you run it.
crawlerlatestpkurustwechat
Info-Spider - 微信公众号爬虫 CLI
A CLI crawler for WeChat Official Account (公众号) articles via the MP backend.
Architecture
- Crate location:
crates/info-spider/ - Auth flow: WeChat QR code login (completely separate from IAAA, does NOT use info-common)
- API: mp.weixin.qq.com backend API
- Config:
~/.config/info-spider/(separate from info-common Store) - Flow docs:
docs/wechat-mp-flow.md
Key Source Files
src/main.rs— Entry pointsrc/cli.rs— Clap CLI definitionsrc/commands.rs— Command implementationssrc/api.rs— WeChat MP API clientsrc/session.rs— Own session persistence (token, fingerprint, bizuin)src/client.rs— reqwest client builders
CLI Commands
| Command | Function |
|---|---|
login | WeChat QR code scan login to mp.weixin.qq.com |
logout / status | Session management |
search <QUERY> | Find Official Accounts by name/ID (returns fakeid list) |
articles | Fetch articles from an OA (--name or --fakeid) |
scrape <URL> | Convert single article URL to Markdown |
Articles Command Options
--begin— Start offset for pagination--count— Articles per page--limit— Maximum total articles to fetch--delay-ms— Random delay between requests (anti-crawler)--format {table|json|jsonl}— Output format
Development Notes
- Standalone auth: Uses its own WeChat QR login, NOT the IAAA flow from info-common
- Own session.rs: Stores token, fingerprint, bizuin (different from info-common session format)
- Mimics real user behavior with configurable delays to bypass risk controls
- Article scraping extracts content to clean Markdown
- Multiple output formats: table (default), JSON, JSONL
- All user-facing strings in Chinese
- Error handling:
anyhow::Resultwith.context("中文描述")
Comments
Loading comments...
