Install
openclaw skills install hn-crawler-cn爬取 https://hn.aimaker.dev/ 网站资讯,执行爬取->提取->整理->总结完整流程。Invoke when user wants to crawl news from hn.aimaker.dev or process web content through the full pipeline.
openclaw skills install hn-crawler-cn本 Skill 用于爬取 https://hn.aimaker.dev/ 网站的资讯内容,并通过完整的处理流程将原始数据转化为结构化的总结报告。
整个处理流程分为四个阶段:
┌─────────┐ ┌──────────┐ ┌──────────┐ ┌───────────┐
│ Crawl │ -> │ Extract │ -> │ Organize │ -> │ Summarize │
│ 爬取 │ │ 提取 │ │ 整理 │ │ 总结 │
└─────────┘ └──────────┘ └──────────┘ └───────────┘
scripts/crawl.pydata/raw/hn_aimaker_<timestamp>.htmlscripts/extract.pydata/extracted/articles_<timestamp>.jsonscripts/organize.pydata/organized/articles_organized_<timestamp>.jsonscripts/summarize.pydata/summary/summary_<timestamp>.mdcd .trae/skills/hn-crawler/scripts
pip install -r requirements.txt
# 方法1:逐个执行
python scripts/crawl.py
python scripts/extract.py
python scripts/organize.py
python scripts/summarize.py
# 方法2:一键执行完整流程
python scripts/run_pipeline.py
.trae/skills/hn-crawler/
├── SKILL.md # 本文件
├── scripts/
│ ├── requirements.txt # Python 依赖
│ ├── crawl.py # 爬取脚本
│ ├── extract.py # 提取脚本
│ ├── organize.py # 整理脚本
│ ├── summarize.py # 总结脚本
│ └── run_pipeline.py # 一键运行完整流程
└── data/ # 数据输出目录(自动创建)
├── raw/ # 原始 HTML
├── extracted/ # 提取的 JSON 数据
├── organized/ # 整理后的数据
└── summary/ # 总结报告
{
"articles": [
{
"title": "文章标题",
"url": "https://example.com/article",
"summary": "文章摘要",
"published_at": "2024-01-15T10:30:00",
"source": "hn.aimaker.dev",
"category": "AI",
"score": 150
}
],
"metadata": {
"crawled_at": "2024-01-15T12:00:00",
"total_count": 30
}
}
各脚本支持以下环境变量或命令行参数:
TARGET_URL: 目标 URL(默认: https://hn.aimaker.dev/)OUTPUT_DIR: 输出目录(默认: data/)TIMEOUT: 请求超时时间(默认: 30秒)