Install
openclaw skills install deep-web-fetcherFetch and extract structured content from JS-rendered web pages, including main text, metadata, and key domain-specific metrics, without paid APIs.
openclaw skills install deep-web-fetcher版本:1.0.0
描述:免费网页抓取 + 内容提取 + 结构化输出,无需付费API
/web-fetcher <url> [--domain <领域>]
| 参数 | 默认值 | 说明 |
|---|---|---|
url | 必填 | 目标网页URL |
--domain | general | 研究领域,影响指标提取规则 |
general:通用提取healthcare:医疗/健康领域medical:医学研究insurance:保险控费machine_learning:机器学习1. 启动Playwright浏览器
2. 访问目标URL,等待JS渲染完成
3. 使用Readability提取正文
4. 提取元数据(标题、作者、时间)
5. 根据领域规则提取关键指标
6. 输出生成JSON
{
"url": "https://example.com/article",
"success": true,
"title": "文章标题",
"author": "作者名",
"published_date": "2024-01-15",
"content_text": "正文内容...",
"content_html": "<html>...</html>",
"word_count": 1500,
"extracted_metrics": {
"sample_size": "9,080",
"auc": 0.85,
"accuracy": 92.5
},
"error": null
}
/web-fetcher "https://arxiv.org/abs/2301.12345" --domain "machine learning"
/web-fetcher "https://pubmed.ncbi.nlm.nih.gov/38134648/" --domain "medical"
/web-fetcher "https://www.gov.cn/zhengce/zhengceku/2024-01/15/content_6923456.htm" --domain "insurance"
# 安装Python依赖
pip install playwright readability-lxml lxml beautifulsoup4
# 安装浏览器驱动(首次运行需下载~100MB)
playwright install chromium
部分网站有反爬机制,如遇失败可:
time.sleep()browser.new_context() 中添加代理user_agent 参数# 生成卡片
/web-fetcher <url> --domain "insurance" > sources/card-xxx.json
# 转换卡片格式
python3 scripts/convert-to-card.py sources/card-xxx.json
skills/web-fetcher/
├── SKILL.md
└── scripts/
└── web-fetcher.py
| 版本 | 日期 | 更新 |
|---|---|---|
| 1.0.0 | 2026-03-19 | 初始版本 |
完全免费,本地运行,数据不出机器