Scrapling Web Fetch

使用 Scrapling + html2text 获取现代网页正文内容，支持微信公众号文章抓取与尾部噪音清洗，减少无用信息与 token 消耗；适合抓取博客、新闻、公告及许多普通 fetch 不稳定、存在反爬或动态渲染干扰的网页。Supports WeChat article cleanup, markdown...

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 0 · 325 · 3 current installs · 3 all-time installs

by晨冬@jllyzzd2023

MIT-0

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

✓

Purpose & Capability

Name/description match the code and runtime instructions: the script fetches pages, selects likely article containers, converts to Markdown, cleans WeChat noise, supports batch mode and site overrides. No unrelated credentials, binaries, or paths are required.

✓

Instruction Scope

SKILL.md instructs running the included Python script and describes inputs/outputs. The script only reads files explicitly passed by the user (--batch, --selectors) and fetches the provided URLs. It does not attempt to read arbitrary system files or environment variables.

ℹ

Install Mechanism

There is no install spec (instruction-only), which is low-risk. However the skill recommends installing two PyPI packages (scrapling, html2text) via pip. Installing unknown third-party packages can execute arbitrary code at install/run time; the package 'scrapling' is not further documented here, so evaluate that package before installing.

✓

Credentials

The skill requests no environment variables, no credentials, and no config paths. The script only reads user-supplied batch or selectors files and writes output to stdout/stderr as expected.

✓

Persistence & Privilege

The skill does not request persistent or privileged presence (always:false). It does not modify other skills or system-wide configuration.

What to consider before installing

This skill appears to do what it says: fetch pages, extract body text, convert to Markdown, and clean WeChat noise. Before installing or running it, review the PyPI package 'scrapling' (and its homepage/source) because the script depends on it; a malicious or overly-permissive package could perform network calls or execute code. Run pip installs in a virtualenv or sandbox, inspect installed package source, and avoid passing sensitive local file paths to --batch or --selectors (the script will read those files). If you need stronger assurance, ask for the 'scrapling' package source or use an alternative extractor implemented with well-known libraries (requests + readability / newspaper / browser automation) whose behavior you can audit.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.2

Download zip

latestvk97fpqsvdh7e316r43j40d103s82h8ke

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

SKILL.md

Scrapling Web Fetch

当用户要获取网页内容、正文提取、把网页转成 markdown/text、抓取文章主体时，优先使用此技能。

默认流程

使用 python3 scripts/scrapling_fetch.py <url> <max_chars>
默认正文选择器优先级：
- article
- main
- .post-content
- [class*="body"]
命中正文后，使用 html2text 转 Markdown
若都未命中，回退到 body
最终按 max_chars 截断输出

用法

python3 /Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/scripts/scrapling_fetch.py <url> 30000

依赖

优先检查：

scrapling
html2text

若缺失，可安装：

python3 -m pip install scrapling html2text

输出约定

脚本默认输出 Markdown 正文内容。如需结构化输出，可追加 --json。如需调试提取命中了哪个 selector，可查看 stderr 输出。

附加资源

用法参考：/Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/references/usage.md
选择器策略：/Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/references/selectors.md
统一入口：/Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/scripts/fetch-web-content

何时用这个技能

获取文章正文
抓博客/新闻/公告正文
将网页转成 Markdown 供后续总结
常规 fetch 效果差，希望提升现代网页抓取稳定性

何时不用

需要完整浏览器交互、点击、登录、翻页时：改用浏览器自动化
只是简单获取 API JSON：直接请求 API 更合适

Files

6 total

Select a file

Select a file to preview.

Comments

Loading comments…