web-fetch
使用带 Stealth 插件的无头浏览器抓取网页内容并转换为 Markdown。用于当需要获取特定网页的正文、新闻详情、公司财报或其他长篇网页内容时。支持绕过大多数基础反爬虫检测。
MIT-0 · Free to use, modify, and redistribute. No attribution required.
⭐ 0 · 331 · 2 current installs · 2 all-time installs
bywei.wu@dlutwuwei
MIT-0
Security Scan
OpenClaw
Benign
medium confidencePurpose & Capability
The name/description (stealth headless scraping and HTML→Markdown conversion) align with the included code and package.json dependencies (playwright, playwright-extra, puppeteer-extra-plugin-stealth, turndown). Minor inconsistency: registry metadata said 'instruction-only' / no install spec, but the repo contains a package.json and a runnable script—so the skill requires installing Node deps even though there's no formal install spec in the registry.
Instruction Scope
SKILL.md instructs running the provided Node script and to npm install the listed dependencies. The runtime instructions only open the target URL in a headless browser, remove certain DOM elements, convert the body HTML to Markdown, and print to console. The only oddity is an absolute example path (/Users/wuwei/.openclaw/...) which is a local author path and not required for functionality.
Install Mechanism
There is no formal install spec in the registry (highest-signal install section is empty), but the package.json declares npm dependencies. Installing will pull Playwright and browser binaries (large downloads from official hosts). This is expected for a browser-based scraper but the lack of an explicit install spec in the skill metadata is a packaging/UX inconsistency.
Credentials
The skill requests no environment variables, no credentials, and references no config paths. The code does not read other env vars or credentials—its network access is limited to visiting the target URL(s) supplied by the caller.
Persistence & Privilege
always:false and standard model invocation settings. The skill does not request persistent system-wide privileges or modify other skills. Autonomous invocation is allowed (platform default) but not combined with other concerning flags.
Assessment
This skill appears to do what it says: run a headless Playwright browser with stealth evasion, fetch a page, strip scripts/iframes/ads, convert HTML to Markdown, and print it. Before installing: 1) be aware it includes stealth evasion code—using it may violate website terms of service or local law; only use on sites you are allowed to scrape. 2) npm install will download Playwright and browser binaries (large network downloads); run it in an isolated environment (container or VM) if you want to limit risk. 3) SKILL.md examples use a hard-coded local path—verify where you run the commands and adjust paths accordingly. 4) If you need stronger assurance, inspect package.json versions and run the code in a sandbox to confirm no unexpected network calls beyond visiting the target URL. If any additional metadata (an explicit install spec, source homepage, or maintainer contact) becomes available, re-evaluate to increase confidence.Like a lobster shell, security has layers — review code before you run it.
Current versionv1.0.0
Download ziplatest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
SKILL.md
Web Fetch Skill
使用无头浏览器(Playwright + Stealth Plugin)抓取指定 URL 的网页内容,并自动转换为 Markdown 格式以便于阅读和进一步处理。
主要特性
- 反爬虫绕过: 集成了
playwright-extra和puppeteer-extra-plugin-stealth,自动处理各种浏览器指纹和自动化特征检测。 - 内容转换: 使用
turndown库将复杂的 HTML 页面转换为简洁的 Markdown 格式。 - 环境模拟: 模拟真实用户视口大小和无头浏览器配置。
使用方法
运行抓取脚本:
cd /Users/wuwei/.openclaw/workspace/skills/web-fetch/scripts
node fetch.js <url>
参数说明
url: 需要抓取的完整网页 URL(包括 http/https)。
示例
# 抓取新浪财经
node fetch.js "https://finance.sina.com.cn/stock/"
# 抓取特定新闻页面
node fetch.js "https://finance.eastmoney.com/a/202403143012345678.html"
输出
脚本将会在控制台输出以下内容:
- 抓取进度说明。
- 页面标题。
- 转换后的 Markdown 正文内容(较长内容会截断)。
依赖
- playwright-extra: 插件化 Playwright 核心。
- puppeteer-extra-plugin-stealth: 提供各种 evasion 策略。
- turndown: HTML 到 Markdown 转换服务。
安装依赖:
cd /Users/wuwei/.openclaw/workspace/skills/web-fetch/scripts
npm install
Files
3 totalSelect a file
Select a file to preview.
Comments
Loading comments…
