Install
openclaw skills install zhihu-fetcher知乎数据获取 - 极简设计,支持三级认证降级(Browser Profile → File Cookie → Fallback),确保数据可靠获取 Zhihu Data Fetcher - Minimalist design with three-level auth fallback (Browser Prof...
openclaw skills install zhihu-fetcher极简设计的知乎数据抓取工具,支持三级认证降级,确保在各种环境下都能获取数据。 A minimalist Zhihu data fetching tool with three-level authentication fallback, ensuring reliable data retrieval in any environment.
优先级1: Browser Profile / 浏览器配置
使用 OpenClaw browser 已登录的状态 / Use OpenClaw browser logged-in state
↓ 失败/无登录态 / Fail or no login state
优先级2: File Cookie / 文件 Cookie
使用配置文件中固化的 Cookie / Use cookie固化 in config file
↓ 失败/无配置 / Fail or no config
优先级3: Fallback Source / 备用源
使用无需认证的备用数据源 / Use unauthenticated fallback data source
# 1. 登录知乎 / Login to Zhihu
browser open https://www.zhihu.com
# 完成登录 / Complete login
# 2. 获取数据并保存到数据库 / Fetch data and save to database
cd scripts
python3 save_to_db.py 50
# 1. 编辑配置文件,填入 Cookie / Edit config file and fill in cookies
vim config/fallback-sources.json
# 2. 直接运行(无需 browser)/ Run directly (no browser needed)
python3 scripts/save_to_db.py 50
# 无需任何认证 / No authentication required
python3 scripts/save_to_db.py
# 会自动降级到 fallback source / Auto-downgrade to fallback source
python3 scripts/init_db.py
# 采集50条热榜并保存到数据库 / Collect 50 hot items and save to database
python3 scripts/save_to_db.py 50
# 显示结果 / Display results:
# ✅ 完成! / Complete!
# 发现 / Found: 50 条
# 新增 / New: 50 条
# 更新 / Updated: 0 条
# 查看今天的热榜 / View today's hot list
python3 scripts/query.py today
# 查看指定日期 / View specific date
python3 scripts/query.py date 2026-03-15
# 查看最近7天统计 / View last 7 days stats
python3 scripts/query.py stats 7
# 查看抓取日志 / View fetch logs
python3 scripts/query.py logs
-- 文章表 / Articles table
CREATE TABLE articles (
id TEXT PRIMARY KEY, -- 文章ID (URL的SHA256) / Article ID (URL SHA256)
platform TEXT DEFAULT 'zhihu', -- 平台标识 / Platform identifier
article_type TEXT DEFAULT 'hot', -- 类型: hot/search / Type: hot/search
rank INTEGER, -- 排名 / Ranking
title TEXT NOT NULL, -- 标题 / Title
url TEXT NOT NULL, -- 链接 / Link
heat INTEGER, -- 热度值 / Heat value
author TEXT, -- 作者 / Author
summary TEXT, -- 摘要 / Summary
published_at INTEGER, -- 发布时间 / Published time
fetched_at INTEGER, -- 抓取时间 / Fetched time
fetch_date TEXT, -- 抓取日期 (YYYY-MM-DD) / Fetch date
raw_data TEXT, -- 原始JSON / Raw JSON
UNIQUE(url, fetch_date) -- 同一天同一URL只存一次 / Unique per URL per day
);
-- 抓取日志表 / Fetch logs table
CREATE TABLE fetch_logs (...);
编辑 config/fallback-sources.json / Edit config/fallback-sources.json:
{
"cookie": {
"zhihu_session": "你的_session值 / your_session_value",
"z_c0": "你的_z_c0值 / your_z_c0_value",
"_xsrf": "你的_xsrf值 / your_xsrf_value",
"_zap": "...",
"d_c0": "..."
}
}
获取 Cookie 方法 / How to get cookies:
# 获取30条热榜 / Get 30 hot items
node snippets/fetch-hot.js 30
# 保存到文件 / Save to file
node snippets/fetch-hot.js 50 ./zhihu-hot.json
# 自定义频率限制(5秒/次)/ Custom rate limit (5 seconds per request)
RATE_LIMIT=5000 node snippets/fetch-hot.js
const { fetchWithFallback } = require('./snippets/fetch-hot.js');
const data = await fetchWithFallback({
limit: 30,
rateLimitMs: 2000
});
// 自动选择最佳认证方式 / Auto-select best auth method
console.log('认证方式 / Auth method:', data.meta.auth_method);
// browser_profile | file_cookie | fallback_source
report.zhihu = data.data;
zhihu-fetcher/
├── SKILL.md # 本文档 / This document
├── config/
│ └── fallback-sources.json # 配置:cookie + 备用源 / Config: cookie + fallback
├── data/
│ ├── zhihu.db # SQLite数据库 (自动创建) / SQLite DB (auto-created)
│ └── index.html # HTML可视化报告 (自动生成) / HTML report (auto-generated)
├── scripts/
│ ├── init_db.py # 数据库初始化 / DB initialization
│ ├── db.py # 数据库操作模块 / DB operations module
│ ├── save_to_db.py # 采集并保存到数据库 / Collection & save
│ ├── query.py # 数据查询工具 / Data query tool
│ └── generate_html.py # HTML可视化报告生成 / HTML report generation
└── snippets/
├── hot.js # 浏览器提取代码 / Browser extraction
├── search.js # 搜索提取代码 / Search extraction
├── rate-limiter.js # 频率限制器 / Rate limiter
├── cookie-manager.js # Cookie 管理 / Cookie manager
├── fallback.js # 备用源获取 / Fallback source
├── fetch-hot.js # 完整热榜获取(三级认证)/ Full hot list (3-level auth)
└── test-simple.js # 测试脚本 / Test script
{
"meta": {
"source": "zhihu",
"fetch_time": "2026-03-15T08:30:00.000Z",
"mode": "hot",
"auth_method": "file_cookie",
"rate_limited": true,
"count": 30
},
"data": [
{
"rank": 1,
"title": "...",
"heat": 4030000,
"url": "..."
}
]
}
# 生成可视化HTML报告 / Generate visual HTML report
python3 scripts/generate_html.py
# 打开查看 / Open to view
open data/index.html
HTML报告功能 / HTML Report Features:
# 进入数据库 / Enter database
sqlite3 data/zhihu.db
# 查看今天的数据 / View today's data
SELECT rank, title, heat FROM articles
WHERE fetch_date = date('now')
ORDER BY rank LIMIT 10;
# 查看最近7天每天采集多少条 / View daily count for last 7 days
SELECT fetch_date, COUNT(*) FROM articles
WHERE fetch_date >= date('now', '-7 days')
GROUP BY fetch_date ORDER BY fetch_date DESC;
# 查看抓取日志 / View fetch logs
SELECT * FROM fetch_logs ORDER BY started_at DESC LIMIT 5;
编辑 config/fallback-sources.json / Edit config/fallback-sources.json:
{
"fallbacks": [
{
"name": "zhihu-hot-hub",
"url": "...",
"type": "markdown",
"priority": 1
},
{
"name": "another-api",
"url": "https://api.example.com/zhihu-hot.json",
"type": "json",
"priority": 2
}
]
}
如需调整认证顺序,编辑配置 / To adjust auth priority, edit config:
{
"auth": {
"priority": [
"file_cookie", // 优先使用固化 cookie / Prefer file cookie
"browser_profile", // 其次使用 browser / Then browser
"fallback_source" // 最后使用备用源 / Finally fallback
]
}
}
✅ 已通过测试 / Tested
# 测试备用源 / Test fallback
node snippets/test-simple.js
| 场景 / Scenario | 推荐方式 / Recommended | 原因 / Reason |
|---|---|---|
| 日常开发 / Daily dev | Browser Profile | 最稳定、数据最全 / Most stable, complete data |
| CI/CD 自动化 / Automation | File Cookie | 无需交互、可固化 / No interaction, reproducible |
| 应急备用 / Emergency | Fallback Source | 无需任何认证 / No auth required |
| 方案 / Solution | 复杂度 / Complexity | 可靠性 / Reliability | 适用场景 / Use Case |
|---|---|---|---|
| 本方案 / This solution | 低 / Low | 高(三级降级)/ High (3-level) | 调研数据获取 / Research data |
| 小红书式 / XHS-style | 高 / High | 高 / High | 运营自动化 / Ops automation |
| 纯 API / Pure API | 中 / Medium | 中 / Medium | 生产环境 / Production |