Install
openclaw skills install rss-fetcher统一的RSS采集与管理系统 | Unified RSS Feed Fetcher and Manager 支持增量抓取、自动去重、自动标签、源健康监控、HTML报告生成 Incremental fetching, auto-dedup, auto-tagging, source health monitoring, HTML reports
openclaw skills install rss-fetcher| 表名 / Table | 用途 / Purpose | 核心字段 / Core Fields |
|---|---|---|
articles | 文章主数据 / Article data | id, source_id, category, title, url, published_at |
tags | 标签定义 / Tag definitions | id, name |
article_tags | 文章-标签关联 / Article-tag relation | article_id, tag_id |
fetch_logs | 抓取日志 / Fetch logs | source_id, started_at, found, new, status |
说明 / Note: RSS源通过 config/sources.json 文件管理,不存入数据库 | RSS sources are managed via config/sources.json file, not in database
cd skills/rss_fetcher
python3 scripts/init_db.py
编辑 config/sources.json,添加你的RSS源:
Edit config/sources.json to add your RSS sources:
{
"sources": [
{
"id": "openai",
"name": "OpenAI Blog",
"url": "https://openai.com/blog/rss.xml",
"category": "tech",
"enabled": true
}
]
}
# 抓取所有源(最近24小时)/ Fetch all sources (last 24 hours)
python3 scripts/fetch.py
# 抓取指定源 / Fetch specific sources
python3 scripts/fetch.py --sources openai huggingface
# 抓取最近48小时 / Fetch last 48 hours
python3 scripts/fetch.py --hours 48
# 使用更多线程(默认20,最大50)/ Use more workers (default 20, max 50)
python3 scripts/fetch.py --workers 50
⚠️ 抓取后记得更新HTML报告 - 新抓取的文章需要重新生成页面才能在浏览器中查看 ⚠️ Remember to update HTML report after fetching - New articles require regeneration to view in browser
python3 scripts/fetch.py && python3 scripts/generate_html.py
注意:每次抓取新文章后,必须重新生成HTML页面才能看到最新内容。 Note: Must regenerate HTML after fetching new articles to see latest content.
# 抓取并立即更新HTML(推荐工作流)/ Fetch and update HTML (recommended workflow)
python3 scripts/fetch.py && python3 scripts/generate_html.py
# 单独生成HTML(已有新数据时)/ Generate HTML only (when new data exists)
python3 scripts/generate_html.py
# 打开查看 / Open to view
open data/index.html # Mac
# 或浏览器访问 / Or browser: file:///.../rss_fetcher/data/index.html
HTML报告功能 / HTML Report Features:
# 检查所有源的健康状态 / Check all source health
python3 scripts/source.py check
# 查看源统计 / View source statistics
python3 scripts/source.py stats
# 添加新源 / Add new source
python3 scripts/source.py add myblog "My Blog" "https://example.com/feed.xml" tech
# 禁用/启用/删除源 / Disable/enable/remove source
python3 scripts/source.py disable myblog
python3 scripts/source.py enable myblog
python3 scripts/source.py remove myblog
# 终端表格查看最近文章 / View recent articles in terminal table
python3 scripts/list.py
# 查看最近48小时 / View last 48 hours
python3 scripts/list.py --hours 48
# 按分类查看 / View by category
python3 scripts/list.py --category tech
# JSON格式输出 / JSON output
python3 scripts/list.py --json
{
"_description": "RSS源配置文件 | RSS source config file",
"_updated": "2026-03-15",
"_total_sources": 111,
"sources": [
{
"id": "openai",
"name": "OpenAI Blog",
"url": "https://openai.com/blog/rss.xml",
"category": "tech",
"enabled": true
}
]
}
字段说明 / Field Description:
id - 源唯一标识 | Source unique identifiername - 显示名称 | Display nameurl - RSS订阅地址 | RSS feed URLcategory - 文章分类 | Article categoryenabled - 是否启用 | Whether enabled分类可自由定义,在 sources.json 中使用任意分类名称即可。
Categories can be freely defined using any category name in sources.json.
<category> 标签内容
Prioritize RSS category - Extract <category> tag content| 关键词 / Keywords | 标签 / Tag |
|---|---|
| AI, GPT, 大模型, 机器学习 | AI |
| 区块链, 比特币, crypto | 区块链 / Blockchain |
| 股票, 股市, equity | 股票 / Stocks |
| 游戏, gaming, esports | 游戏 / Gaming |
| ... | ... |
规则定义在 fetch.py 的 TAG_RULES 中,可自由扩展。
Rules defined in TAG_RULES in fetch.py, freely extensible.
SELECT title, url, source_id
FROM articles
WHERE date(fetched_at, 'unixepoch') = date('now')
ORDER BY published_at DESC;
SELECT * FROM articles
WHERE category = 'tech'
AND published_at > strftime('%s', 'now', '-24 hours');
SELECT a.title, a.url, GROUP_CONCAT(t.name) as tags
FROM articles a
LEFT JOIN article_tags at ON a.id = at.article_id
LEFT JOIN tags t ON at.tag_id = t.id
WHERE a.category = 'tech'
GROUP BY a.id;
SELECT t.name, COUNT(*) as count
FROM tags t
JOIN article_tags at ON t.id = at.tag_id
GROUP BY t.id
ORDER BY count DESC;
rss_fetcher/
├── SKILL.md # 本文档 | This document
├── config/
│ └── sources.json # RSS源配置 | RSS source config
├── scripts/
│ ├── init_db.py # 数据库初始化 | DB initialization
│ ├── fetch.py # 核心抓取脚本(含自动标签)| Core fetch script
│ ├── generate_html.py # HTML报告生成 | HTML report generation
│ ├── source.py # 源健康检查与管理 | Source health check
│ ├── list.py # 终端文章列表 | Terminal article list
│ └── query.py # 数据查询工具 | Data query tool
├── data/
│ ├── rss_fetcher.db # SQLite数据库 | SQLite database
│ └── index.html # 生成的HTML报告 | Generated HTML report
└── references/
└── schema.sql # 数据库结构参考 | DB schema reference
rss_fetcher/data/rss_fetcher.db
generate_html.py
Regularly regenerate HTML - Must rerun generate_html.py after fetching new articlesPart of OpenClaw Daily Research System