PKU Info Spider

v1.0.0

WeChat Official Account article crawler (微信公众号爬虫) CLI tool built in Rust. Use this skill when working on the info-spider crate, debugging spider commands, ad...

0· 107·0 current·0 all-time
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
Name, description and SKILL.md consistently describe a Rust CLI for crawling WeChat Official Account articles (login, search, scrape, output formats). Nothing in the metadata asks for unrelated services or credentials.
Instruction Scope
SKILL.md documents repo layout, commands, config path (~/.config/info-spider/) and session tokens (token, fingerprint, bizuin). The file is descriptive and does not itself instruct the agent to read or exfiltrate system files, but it explicitly references local session storage and the QR login flow which an agent might be asked to access when debugging—so exercise caution if allowing the agent to interact with local files or accounts.
Install Mechanism
Instruction-only skill with no install spec, no downloads, and no declared required binaries—lowest install risk.
Credentials
The skill declares no required environment variables or credentials (proportional), but its functionality inherently requires session tokens/credentials stored in config paths. If you plan to use the skill to operate the CLI, be aware it will need access to WeChat session data; the skill does not declare or justify any other unrelated secrets.
Persistence & Privilege
always is false and there is no install-time persistence or modification of other skills. Default autonomous invocation is allowed by platform policy but not a standalone concern here.
Assessment
This SKILL.md reads like accurate documentation for a local Rust CLI that scrapes WeChat MP. Before installing or giving the agent permission to act: (1) confirm the skill's source and review the actual code (there is no homepage or source link); (2) never paste WeChat account credentials into the agent—if testing login flows, use throwaway accounts; (3) be mindful that scraping MP may violate terms of service or local law and the project explicitly mentions evasion (configurable delays) — consider the legal/ethical implications; (4) restrict the agent's access to your filesystem (especially ~/.config/info-spider/) unless you explicitly want it to read session tokens; and (5) if you need help debugging, prefer having the agent suggest commands/snippets rather than granting it direct execution or file access. If you want a higher-confidence assessment, provide the repository or actual source files for code review.

Like a lobster shell, security has layers — review code before you run it.

crawlervk9711a9vykrz9nz1nhp7xa581x84hyq3latestvk9711a9vykrz9nz1nhp7xa581x84hyq3pkuvk9711a9vykrz9nz1nhp7xa581x84hyq3rustvk9711a9vykrz9nz1nhp7xa581x84hyq3wechatvk9711a9vykrz9nz1nhp7xa581x84hyq3
107downloads
0stars
1versions
Updated 1w ago
v1.0.0
MIT-0

Info-Spider - 微信公众号爬虫 CLI

A CLI crawler for WeChat Official Account (公众号) articles via the MP backend.

Architecture

  • Crate location: crates/info-spider/
  • Auth flow: WeChat QR code login (completely separate from IAAA, does NOT use info-common)
  • API: mp.weixin.qq.com backend API
  • Config: ~/.config/info-spider/ (separate from info-common Store)
  • Flow docs: docs/wechat-mp-flow.md

Key Source Files

  • src/main.rs — Entry point
  • src/cli.rs — Clap CLI definition
  • src/commands.rs — Command implementations
  • src/api.rs — WeChat MP API client
  • src/session.rs — Own session persistence (token, fingerprint, bizuin)
  • src/client.rs — reqwest client builders

CLI Commands

CommandFunction
loginWeChat QR code scan login to mp.weixin.qq.com
logout / statusSession management
search <QUERY>Find Official Accounts by name/ID (returns fakeid list)
articlesFetch articles from an OA (--name or --fakeid)
scrape <URL>Convert single article URL to Markdown

Articles Command Options

  • --begin — Start offset for pagination
  • --count — Articles per page
  • --limit — Maximum total articles to fetch
  • --delay-ms — Random delay between requests (anti-crawler)
  • --format {table|json|jsonl} — Output format

Development Notes

  • Standalone auth: Uses its own WeChat QR login, NOT the IAAA flow from info-common
  • Own session.rs: Stores token, fingerprint, bizuin (different from info-common session format)
  • Mimics real user behavior with configurable delays to bypass risk controls
  • Article scraping extracts content to clean Markdown
  • Multiple output formats: table (default), JSON, JSONL
  • All user-facing strings in Chinese
  • Error handling: anyhow::Result with .context("中文描述")

Comments

Loading comments...