Wechat Fetch Publish

v3.0.0

微信公众号文章抓取工具 v3.0 - Lite轻量版 + Playwright版,支持免登录、批量抓取、图片下载、多格式输出

0· 168·1 current·1 all-time
byJackyFan@jackyfan01

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for jackyfan01/wechat-fetch.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Wechat Fetch Publish" (jackyfan01/wechat-fetch) from ClawHub.
Skill page: https://clawhub.ai/jackyfan01/wechat-fetch
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install wechat-fetch

ClawHub CLI

Package manager switcher

npx clawhub@latest install wechat-fetch
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (WeChat article fetcher) match the included scripts, CLI examples, and Python API. Required dependencies (requests/BeautifulSoup and optional Playwright) make sense for the Lite and Playwright variants. Nothing in the manifest or code appears to require unrelated system access or extraneous credentials.
Instruction Scope
SKILL.md instructs the agent to fetch article URLs, optionally download images, write outputs (markdown/html/json/txt), and use Playwright for JS-rendered pages; the code follows those steps. The skill reads a provided batch file of URLs and writes output files and an images/ subdirectory. It does not contact any unexpected third-party endpoints beyond the target article/image hosts. Note: the Playwright mode uses a persistent browser context (local user_data_dir) to store cookies/profile if Cookie mode is used.
Install Mechanism
There is no packaged install spec; README/SKILL.md recommend pip install of dependencies and 'playwright install chromium' for the Playwright mode. That is a standard approach for Python/Playwright projects. The installation will download Chromium when Playwright is installed, which is expected for the Playwright variant.
Credentials
The skill does not request environment variables or other credentials. However, the Playwright variant creates a persistent browser profile under ~/.openclaw/browser/wechat-fetch-v3 to store cookies and session data when using persistent (Cookie) mode; that could retain authentication cookies and thus access private content if you log in. This persistence is proportionate to the stated Cookie-mode feature but is worth awareness.
Persistence & Privilege
always:false and the skill is user-invocable (normal). The only notable persistence is the creation of a per-skill user_data_dir in the user's home to hold a persistent browser context for Cookie mode. The skill does not modify other skills or system-wide agent configs.
Assessment
This package appears to do what it claims (scrape public WeChat articles, download images, and save outputs). Before installing/running: (1) Review and be comfortable with files being written to the output directory and a browser profile stored under ~/.openclaw/browser/wechat-fetch-v3 (this may contain cookies/session info if you use Cookie mode). (2) Only scrape content you are permitted to access — scraping can violate site terms of service or copyright. (3) When using the Playwright mode, expect a Chromium download during setup. (4) Run the code in an isolated environment (virtualenv/container) if you do not fully trust the source, and inspect the scripts yourself if you have concerns about saved cookie data or filesystem paths. (5) Consider rate limits and use delays/batch sizes to avoid triggering anti-bot measures.

Like a lobster shell, security has layers — review code before you run it.

latestvk97adnkaptnnr3p17g4bbbhx5h83qxaj
168downloads
0stars
4versions
Updated 1mo ago
v3.0.0
MIT-0

WeChat Fetch v3.0 - 微信文章抓取工具

微信公众号文章抓取工具 v3.0,在 v2.0 基础上新增:免登录模式批量抓取图片下载多格式输出

新增特性 (v3.0)

特性说明
免登录模式无需扫码登录,直接抓取公开文章
批量抓取支持从文件读取多个 URL 批量下载
图片下载自动下载文章图片到本地
多格式输出Markdown/HTML/JSON/TXT 四种格式

使用方法

版本选择

版本适用场景资源需求批量抓取
Lite 版快速抓取、低内存环境低(无需浏览器)✅ 支持
Playwright 版需要 Cookie 登录、复杂页面高(需 Chromium)✅ 支持

1. Lite 版(推荐)

# 基本用法
python3 scripts/wechat_fetch_lite.py "https://mp.weixin.qq.com/s/xxxxx"

# 指定输出格式
python3 scripts/wechat_fetch_lite.py "https://mp.weixin.qq.com/s/xxxxx" \
  --format html --output article.html

# 下载图片
python3 scripts/wechat_fetch_lite.py "https://mp.weixin.qq.com/s/xxxxx" \
  --download-images --output article.md

# 批量抓取
python3 scripts/wechat_fetch_lite.py --batch urls.txt --output ./articles \
  --format markdown --delay 3

2. Playwright 版(需 Cookie 登录时)

# 免登录模式
python3 scripts/wechat_fetch_v3.py "https://mp.weixin.qq.com/s/xxxxx" --no-login

# Cookie 模式(需预先登录)
python3 scripts/wechat_fetch_v3.py "https://mp.weixin.qq.com/s/xxxxx"

# 批量抓取(带重试)
python3 scripts/wechat_fetch_v3.py --batch urls.txt --output ./articles \
  --no-login --max-retries 3 --retry-delay 5

2. 批量抓取

# 创建 URL 列表文件 urls.txt
echo "https://mp.weixin.qq.com/s/xxx1" > urls.txt
echo "https://mp.weixin.qq.com/s/xxx2" >> urls.txt
echo "https://mp.weixin.qq.com/s/xxx3" >> urls.txt

# 批量抓取
python3 scripts/wechat_fetch_v3.py --batch urls.txt --output ./articles \
  --no-login --download-images --format markdown

3. Python API

Lite 版(推荐):

from scripts.wechat_fetch_lite import WeChatFetcherLite

fetcher = WeChatFetcherLite()

# 单篇抓取
result = fetcher.fetch_single(
    url="https://mp.weixin.qq.com/s/xxxxx",
    download_images=True,
    output_format="markdown"
)

Playwright 版:

from scripts.wechat_fetch_v3 import WeChatFetcher

fetcher = WeChatFetcher()

# 单篇抓取
result = fetcher.fetch_single(
    url="https://mp.weixin.qq.com/s/xxxxx",
    no_login=True,
    download_images=True,
    output_format="markdown"
)

# 批量抓取
results = fetcher.fetch_batch(
    urls=["url1", "url2", "url3"],
    output_dir="./articles",
    no_login=True,
    download_images=True,
    output_format="json"
)

参数说明

Lite 版参数

参数说明默认值
url微信文章 URL-
-o, --output输出文件路径-
--batch批量抓取文件(每行一个URL)-
--download-images下载图片到本地False
--format输出格式 (markdown/html/json/txt)markdown
--timeout超时时间(秒)30
--delay请求间隔(秒)2

Playwright 版参数

参数说明默认值
url微信文章 URL-
-o, --output输出文件/目录路径-
--batch批量抓取文件路径-
--no-login免登录模式False
--download-images下载图片到本地False
--format输出格式 (markdown/html/json/txt)markdown
--headless无头模式True
--timeout超时时间(秒)30
--max-retries最大重试次数(批量模式)3
--retry-delay重试间隔(秒)5

输出格式对比

格式说明适用场景
Markdown标准 Markdown,含元数据通用,推荐
HTML完整 HTML 页面网页展示
JSON结构化数据程序处理
TXT纯文本简单阅读

版本对比

特性Lite 版Playwright 版 (v3)Cookie 模式 (v2)
资源需求
速度中等中等
稳定性
Cookie 登录
批量抓取
图片下载
多格式输出
重试机制
推荐场景日常使用复杂需求私密文章

批量抓取示例

# 1. 准备 URL 文件
cat > urls.txt << 'EOF'
https://mp.weixin.qq.com/s/article1
https://mp.weixin.qq.com/s/article2
https://mp.weixin.qq.com/s/article3
EOF

# 2. 执行批量抓取
python3 scripts/wechat_fetch_v3.py \
  --batch urls.txt \
  --output ./articles \
  --no-login \
  --download-images \
  --format markdown

# 3. 查看结果
ls ./articles/
# article_001.md  article_002.md  article_003.md  images/  batch_report.json

图片下载说明

使用 --download-images 参数时:

  • 图片会下载到 images/ 子目录
  • Markdown 中的图片链接会替换为本地相对路径
  • 支持常见格式:jpg, png, gif, webp
articles/
├── article_001.md
├── article_002.md
└── images/
    ├── image_001.jpg
    ├── image_002.png
    └── image_003.gif

故障排除

免登录模式抓取失败

可能原因:

  • 文章需要登录才能查看
  • 触发微信反爬机制
  • 页面结构变化

解决:

  • 尝试使用 Cookie 模式
  • 增加 --timeout 时间
  • 添加延迟避免频繁请求

图片下载失败

可能原因:

  • 图片 URL 过期
  • 网络问题
  • 图片需要登录权限

解决:

  • 使用 Cookie 模式
  • 检查网络连接
  • 手动下载缺失图片

批量抓取中断

解决:

  • 查看 batch_report.json 了解失败详情
  • 从失败位置继续抓取
  • 调整请求间隔时间

更新日志

v3.0.0 (2026-03-23)

  • ✅ 新增 Lite 轻量版(无需浏览器)
  • ✅ 新增免登录模式
  • ✅ 新增批量抓取功能(Lite + Playwright)
  • ✅ 新增图片下载功能
  • ✅ 新增多格式输出(HTML/JSON/TXT)
  • ✅ 优化代码结构,更易扩展
  • ✅ 添加批量抓取报告
  • ✅ 添加重试机制(Playwright 版)

v2.0.1 (2026-03-20)

  • ✅ 使用持久化浏览器上下文
  • ✅ 复用已登录 Cookie
  • ✅ 添加 Cookie 自动监控
  • ✅ 支持无头模式

依赖

pip install playwright beautifulsoup4 requests
playwright install chromium

许可证

MIT-0 · Free to use, modify, and redistribute. No attribution required.

Comments

Loading comments...