Wechat Article To Markdown

v1.1.0

This skill converts WeChat Official Account (微信公众号) article pages into high-quality, clean Markdown format. It should be used when the user provides a WeChat...

⭐ 0· 108·0 current·0 all-time

by@benzking

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for benzking/wechat-article-to-markdown-v2.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Wechat Article To Markdown" (benzking/wechat-article-to-markdown-v2) from ClawHub.
Skill page: https://clawhub.ai/benzking/wechat-article-to-markdown-v2
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install wechat-article-to-markdown-v2

ClawHub CLI

Package manager switcher

npx clawhub@latest install wechat-article-to-markdown-v2

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

medium confidence

✓

Purpose & Capability

The skill's name/description match the included Python script and reference docs: it fetches mp.weixin.qq.com pages, strips WeChat-specific noise, detects code blocks and images, and outputs Markdown. The code's network activity (page fetch + image downloads from CDN domains like mmbiz.qpic.cn) is consistent with the stated purpose.

ℹ

Instruction Scope

SKILL.md describes running the included script and using convert_simple() programmatically; instructions do not ask the agent to read unrelated system files or solicit unrelated secrets. Note: SKILL.md's Step 1 omits installing Playwright and the necessary 'playwright install chromium' step even though the script's primary fetcher uses Playwright, which will cause runtime errors or surprising fallbacks to requests if Playwright is not present.

✓

Install Mechanism

This is an instruction-only skill (no packaged installer). The code file uses standard Python libraries and Playwright; no external arbitrary downloads from unknown hosts are embedded in the skill. Because Playwright/Chromium may need to be installed manually, running the script can cause a Chromium download via Playwright tooling (this is expected for JS-rendered fetches).

✓

Credentials

The skill declares no required environment variables or credentials and the code does not attempt to read secrets; network access is limited to fetching the article URL and its assets (images). No unrelated service tokens or config paths are requested.

ℹ

Persistence & Privilege

always is false and the skill does not declare autonomous system-wide persistence. One implementation detail: Playwright is launched with launch_persistent_context(user_data_dir='') — using a persistent context may create a browser profile directory (depending on Playwright behavior) and could persist cookies/local storage between runs; this is plausible for convenience but worth noting.

Assessment

This skill appears to do what it says: fetch a WeChat article, clean it, convert it to Markdown, and optionally download images. Before running: 1) Be sure to install Playwright and run 'playwright install chromium' (SKILL.md omits this) or run with a mode that uses requests only; otherwise the script will fall back or error. 2) Expect network activity to the article host and image CDNs (e.g., mp.weixin.qq.com and mmbiz.qpic.cn) and disk writes for output and images — run inside an isolated environment if you want to limit exposure. 3) If you are concerned about persistent browser state, inspect/override the Playwright user_data_dir usage to avoid creating persistent profiles. 4) If you want higher assurance, review the remainder of the script (full file is large) or run it in a sandbox/VM. There are no red flags for credential exfiltration or unknown external endpoints in the provided files.

Like a lobster shell, security has layers — review code before you run it.

latestvk975sxz9czw9dpbyz99h0m10jh84mtts

108downloads

0stars

1versions

Updated 2w ago

v1.1.0

MIT-0

Changelog

v1.1.0 (2026-04-11)

修复：

fetch_with_playwright 改用移动端 Chromium（is_mobile=True + iPhone UA + 393×852 viewport），临时分享链接（tempkey）可正常渲染
新增懒加载图片处理：滚动触发 data-src 图片加载
新增「页面不存在」错误检测

对比（v1.0 → v1.1）：

项目	旧版	新版
User Agent	桌面 Chrome	iPhone Safari
Viewport	1280×900	393×852
临时链接	❌ 无法渲染	✅ 正常
懒加载图片	❌	✅ 滚动触发

WeChat Article to Markdown

Overview

Convert WeChat Official Account articles (mp.weixin.qq.com) into clean, high-quality Markdown. The skill uses a Python script optimized for WeChat's unique DOM structure, featuring deep noise removal, smart code block detection, rich text preservation, and intelligent paragraph formatting.

Workflow

Decision Tree

User provides WeChat article URL?
├── Yes → Go to Step 1: Install Dependencies & Run Script
├── User wants to convert HTML directly?
│   └── Use Step 2: In-Line Conversion (for fetched HTML)
└── User asks about multiple URLs?
    └── Use batch mode with -f flag

Step 1: Install Dependencies & Convert

Ensure Python dependencies are available. Install if missing:
```
pip install requests beautifulsoup4 markdownify
```
Run the conversion script:
```
python scripts/wechat_to_md.py "<WECHAT_URL>" -o "<OUTPUT_DIR>"
```
Options:
- --no-images — Skip image downloading, keep remote URLs
- --no-frontmatter — Omit YAML frontmatter
- Multiple URLs: python scripts/wechat_to_md.py url1 url2 url3

The output structure:

<OUTPUT_DIR>/
└── <Article_Title>/
    ├── <Article_Title>.md
    └── images/
        ├── img_000.png
        └── img_001.jpg

Step 2: In-Line Conversion (for Pre-Fetched HTML)

If the HTML has already been fetched (e.g., via web_fetch), use the script's convert_simple() function programmatically:

import sys
sys.path.insert(0, "<SKILL_DIR>/scripts")
from wechat_to_md import convert_simple

# 基础用法：仅转换，不下载图片
result = convert_simple("https://mp.weixin.qq.com/s/xxxxx")
markdown = result["markdown"]       # Full Markdown string
metadata = result["metadata"]       # {title, author, date, url, ...}
code_blocks = result["code_blocks"] # [{lang, code}, ...]
image_urls = result["image_urls"]   # 原始图片 URL 列表

# 高级用法：同时下载图片到本地
result = convert_simple(
    "https://mp.weixin.qq.com/s/xxxxx",
    download_imgs=True,              # 启用图片下载
    output_dir="./my_article"        # 指定输出目录（可选）
)
markdown = result["markdown"]        # 图片链接已替换为本地路径
image_mapping = result["image_mapping"]  # URL -> 本地路径映射
output_dir = result["output_dir"]    # 实际输出目录

Return the Markdown content directly to the user or write it to a file.

Step 3: Present Results

Display the generated Markdown file path to the user.
If the user wants to review the content, read the .md file and present a summary.
For batch conversions, report success/failure count.

Core Capabilities

1. Deep Noise Removal (WeChat-Specific)

The script removes 30+ WeChat-specific noise elements including:

Ad banners and promotional content (.mp_profile_iframe, #ad_content)
QR codes and reward/tip areas (.reward_area, .qr_code_pc)
Comment sections (#comment_container, #js_cmt_area)
Audio/video players (mpvoice, mpvideo)
Related article recommendations (#relation_article)
Tool bars, footers, copyright areas, tag sections
Hidden elements (display:none, visibility:hidden)
Empty  placeholders

2. Smart Code Block Detection

Handles all 3 WeChat code block formats:

pre.code-snippet with data-lang attribute
.code-snippet__fix container with nested pre[data-lang]
Generic pre[data-lang]

Features:

Auto-detects programming language from data-lang, CSS class, and code content
Removes line numbers (.code-snippet__line-index)
Filters CSS counter leaks (counter(line) garbage text)
Uses placeholder strategy: extract code blocks before conversion, restore after
Supports 25+ languages: Python, JavaScript, TypeScript, Go, Rust, Java, C, C++, SQL, HTML, CSS, JSON, YAML, Shell, Dockerfile, etc.

3. Rich Text Preservation

Bold/Italic: Normalizes  → ,  → , handles inline font-weight: bold
Lists: Converts WeChat marker-based lists (•, ·, 1., (1)) to proper Markdown lists
Blockquotes: Detects left-border styled sections as blockquotes
Tables: Preserves table structure
Links: Preserves article links
Headings: Detects font-size based headings (≥22px → H2, ≥19px → H3)

4. Intelligent Paragraph Formatting

Fixes lazy-loaded images (data-src → src)
Cleans HTML entity residuals (  → space, zero-width spaces removed)
Collapses excessive blank lines (max 2 consecutive)
Trims trailing whitespace per line
Proper spacing around code blocks
Full-width spaces → half-width spaces

5. Metadata Extraction

Generates YAML frontmatter:

---
title: "Article Title"
author: "Account Name"
date: "2026-04-08"
source: "https://mp.weixin.qq.com/s/xxxxx"
description: "Article description if available"
---

6. Image Handling

自动下载：下载所有文章图片到 images/ 子目录
并发下载：默认 5 个并发线程，支持重试机制（默认重试 2 次）
格式检测：从 URL 和 Content-Type 自动检测图片格式
链接替换：自动将 Markdown 中的远程 URL 替换为本地相对路径 (images/img_000.png)
URL 变体处理：智能处理微信图片 URL 的不同查询参数变体
失败回退：下载失败时保留原始远程 URL
文件验证：验证下载文件大小（过滤小于 100 字节的损坏文件）

图片下载增强功能：

# 下载图片并获取映射关系
from wechat_to_md import download_images, replace_image_urls

# 下载图片
url_to_local = download_images(
    img_urls=["https://mmbiz.qpic.cn/..."],
    output_dir=Path("./output"),
    concurrency=5,    # 并发数
    timeout=30,       # 超时时间（秒）
    retries=2         # 重试次数
)

# 替换 Markdown 中的图片链接
md = replace_image_urls(markdown, url_to_local)

Error Handling

Error	Cause	Resolution
`NetworkError`	HTTP failure, timeout, 404	Retries 3x with exponential backoff
`CaptchaError`	Captcha page detected	Inform user to wait and retry
`ParseError`	Content element not found	Check URL validity, may be restricted article
Missing dependencies	`pip install` not run	Install: `pip install requests beautifulsoup4 markdownify`

Important Notes

Only supports mp.weixin.qq.com domain articles
Some code blocks are rendered as images/SVG — their source code cannot be extracted
Captcha pages may appear under high-frequency access; wait and retry
Public articles only — login-gated articles cannot be fetched
Respect original author copyright; for personal study/archiving use only

References

For detailed WeChat article DOM structure, selectors, and element handling, refer to:

references/wechat-dom-reference.md — Complete WeChat DOM structure documentation

Comments

Loading comments...