公众号作者文章抓取

v0.1.0

用本地微信公众号抓取器批量识别并拉取某个公众号作者的历史文章,输出 Markdown、HTML 和 articles.json,供后续做作者语料库、风格拆解、仿写模板、事实核验和内容归档。只要用户提到“抓某个公众号的文章”“下载最近 20/50/100 篇公众号文章”“我给你一篇链接,你继续把这个号的文章都扒下来...

0· 104·0 current·0 all-time
by大壮/Jammy@dazhuangjammy
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description match the provided files and behavior. The skill includes a local Python fetcher, a shell launcher, and configuration; requiring no cloud credentials or unrelated binaries is consistent with a local Playwright-based scraper.
Instruction Scope
SKILL.md instructs the agent to run the bundled scripts, manage config.json, ensure login via QR, poll login-status, and send the generated QR image to the user when appropriate — all of which is coherent with the fetcher purpose. Two points to note: (1) the skill is written to trigger on casual user phrasing (e.g., "把这个号最近的文章弄下来"), which could cause it to activate more often than a user expects; (2) the workflow involves exposing generated QR image/text files to an agent/user for scanning, which is functionally necessary but should be handled carefully to avoid unintended disclosure.
Install Mechanism
There is no external install spec in the registry metadata, but the included run_fetcher.sh will create a Python virtualenv and pip-install requirements (Playwright, httpx, BeautifulSoup, etc.). Installing packages from PyPI and using Playwright is expected for this tool; Playwright may require separate browser runtime installation (the README documents this). This is moderate-risk but proportional to the stated purpose and uses standard package sources (requirements.txt).
Credentials
The skill does not request unrelated environment variables or secrets. It optionally respects environment overrides for profile and artifact dirs, and it stores login artifacts locally (.playwright-profile, login_artifacts). The tool returns local paths (profile_dir, login_artifacts_dir) in its JSON output — these are not credentials by themselves but could help locate sensitive files, so agents should avoid exposing token/cookie contents; the SKILL.md explicitly warns not to reveal cookie/token contents.
Persistence & Privilege
The skill does not request always:true or other elevated platform privileges. It writes/reads its own runtime artifacts (.venv, .playwright-profile, login_artifacts) within the project; this is expected for a local scraper and does not modify other skills or global agent settings.
Assessment
This skill bundles a local scraper and will run code on your machine: the launcher will create a Python virtualenv and pip-install dependencies (including Playwright), then drive a headless or visible browser to log into your WeChat public account by QR scan and download articles to a local folder. Before installing or running: (1) inspect the included files (scripts/main.py, run_fetcher.sh, config.json) and confirm you trust them; (2) be prepared to scan a QR from a trusted UI — do not scan QR images from untrusted sources or send your login artifacts to others; (3) run this on a machine/account you control (avoid shared machines) because the tool stores login caches (.playwright-profile, login_artifacts); (4) be aware pip install will fetch packages from PyPI and Playwright may download browser runtimes; (5) note the SKILL.md’s automatic trigger on casual phrasing — if you want to avoid accidental runs, require explicit confirmation before executing the fetch workflow. If you need higher assurance, run the tool inside an isolated VM or container and/or review the full main.py for any data-exfiltration code (we reviewed the bundle and found behavior consistent with a local scraper).

Like a lobster shell, security has layers — review code before you run it.

contentvk97ebz8ha2dt3y2y3ys72a8wfx83aj2vlatestvk97ebz8ha2dt3y2y3ys72a8wfx83aj2vmpvk97ebz8ha2dt3y2y3ys72a8wfx83aj2vwechatvk97ebz8ha2dt3y2y3ys72a8wfx83aj2v

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Comments