Install
openclaw skills install page-fetchExtract readable content from webpages with a stable, low-dependency workflow. Use when the user asks to open, inspect, summarize, translate, verify, or quote a web page, article, blog post, documentation page, or similar URL. Prefer this skill when cross-model reliability matters: fetch raw HTML first, inspect embedded data next, and escalate to browser rendering only when lightweight methods fail.
openclaw skills install page-fetchUse this skill to extract webpage content in a reproducible way that works well across different models and avoids browser dependence unless necessary.
This skill is built for reliability first:
Use this skill when a user says things like:
This skill is best for:
This skill is not magic. If a page is blocked by login, CAPTCHA, region restrictions, or aggressive anti-bot controls, report that clearly.
The core goal is cross-model reliability.
Different LLMs often choose different ad-hoc ways to read webpages. This skill reduces that variance by giving them a standard path:
mp.weixin.qq.com links to the dedicated WeChat extractor first.For routine webpage reads, run the wrapper first:
python3 scripts/page_fetch.py "https://example.com/article" --format json
What it does:
mp.weixin.qq.com to the dedicated WeChat extractor first--save-json is explicitly passedPersistence rule:
--save-json--output, write to a non-workspace report path chosen by the caller or local runtime conventionUse direct scripts only when you need to debug or force a particular extraction path.
python3 scripts/fetch_wechat_article.py "https://mp.weixin.qq.com/s/..." --format json
What it does:
#js_content / .rich_media_contentWhat to look for in the output:
titleauthoraccount_nicknamepublished_timetextmethodaccess_limitedaccess_limit_reasonpython3 scripts/fetch_page.py "https://example.com/article" --format json
What it does:
requests__NEXT_DATA__What to look for in the output:
titleauthorpublished_timetextmethodnotesIf the lightweight path returns thin, broken, or clearly incomplete content, run:
python3 scripts/render_page.py "https://example.com/article" --format json
What it does:
Use this only when needed. It is slower and heavier than the first-pass extractor.
Always tell the user which method worked:
wechat-domwechat-access-limitedjson-ldembedded-data:__NEXT_DATA__dom-paragraphsbrowser-render:playwrightAlso mention known limitations when relevant:
When using this skill, aim to return the following whenever possible:
Do not imply full page access if only metadata or fragments were recovered.
scripts/page_fetch.pyPurpose:
--save-json is explicitly requestedTypical usage:
python3 scripts/page_fetch.py "https://example.com/article" --format json
Optional explicit persistence:
python3 scripts/page_fetch.py "https://example.com/article" --format json --save-json --output ./example.json
Output fields:
notes including runner step tracesaved_to only when explicit persistence is requestedscripts/fetch_wechat_article.pyPurpose:
mp.weixin.qq.com article pagesTypical usage:
python3 scripts/fetch_wechat_article.py "https://mp.weixin.qq.com/s/..." --format json --max-chars 12000
Output fields:
urlfinal_urlstatus_codetitledescriptionauthoraccount_nicknamepublished_timemethodtextcontent_htmlexcerptnotesaccess_limitedaccess_limit_reasonscripts/fetch_page.pyPurpose:
Typical usage:
python3 scripts/fetch_page.py "https://example.com/article" --format json --max-chars 8000
Output fields:
urlfinal_urlstatus_codetitledescriptionauthorpublished_timemethodtextexcerptnotesscripts/render_page.pyPurpose:
Typical usage:
python3 scripts/render_page.py "https://example.com/article" --format json --wait-ms 2500
Important notes:
Read these when you need more context than the main workflow:
references/strategy.md
references/browser-runtime.md
If the URL is mp.weixin.qq.com, try fetch_wechat_article.py first. If it returns article body text, use that directly. If it reports access limits, say so plainly.
If fetch_page.py returns a solid body via embedded-data:__NEXT_DATA__ or dom-paragraphs, use that result directly.
If fetch_page.py returns thin text or metadata only, escalate to render_page.py.
If browser rendering fails because of login, CAPTCHA, or anti-bot controls, report the limitation plainly and, when appropriate, look for an alternate accessible source.