微博热搜采集 | Weibo Hot Search
微博多频道热搜数据采集与可视化 | Weibo Multi-Channel Hot Search Data Collection & Visualization 支持热搜总榜、社会榜、文娱榜、生活榜同时抓取 | Supports hot search, social, entertainment, life ch...
MIT-0 · Free to use, modify, and redistribute. No attribution required.
⭐ 0 · 31 · 0 current installs · 0 all-time installs
bynoah@noah-1106
MIT-0
Security Scan
OpenClaw
Suspicious
medium confidencePurpose & Capability
The project claims to be a Weibo hot-search collector/visualizer and the included scripts implement that. However the runtime code relies heavily on an external 'openclaw' browser CLI (used to open pages and obtain snapshots) but the skill metadata and SKILL.md do not declare this dependency. The lack of a declared required binary is an incoherence: a legitimate collector needs the browser CLI (or an HTTP client), so the missing declaration is unexpected.
Instruction Scope
SKILL.md documents how to run the Python scripts but never mentions the required 'openclaw' CLI or its security implications. The fetch scripts instruct the agent to run shell commands that open URLs and capture snapshots, then parse snapshot text to extract URLs and post content. That parsing output is later used to construct more shell commands — this expands scope beyond simple parsing and could lead to executing commands derived from external page content. The instructions are also silent about running in an isolated environment or validating inputs.
Install Mechanism
There is no install specification (instruction-only install), so nothing will be automatically downloaded or executed during install. That lowers installer risk. However the skill includes multiple code files that will be placed on disk; they will execute local subprocesses at runtime.
Credentials
The skill requests no environment variables or credentials, and stores data in a local SQLite DB and HTML file. The requested access is proportionate to the described purpose.
Persistence & Privilege
The skill is not always-enabled and does not request elevated platform privileges. It writes only to its own data/ directory and DB; it doesn't modify other skills or system-wide configs.
What to consider before installing
This package mostly does what it claims (collect Weibo hot-search items, save to a SQLite DB, and create an HTML report), but there are notable concerns you should address before running it:
- The scripts rely on an external CLI 'openclaw browser' to open pages and capture snapshots, but SKILL.md and the manifest do not declare this dependency. Ensure you have 'openclaw' installed from a trusted source before using the scripts.
- The fetcher uses subprocess.run with shell=True to execute commands like: openclaw browser open --profile openclaw '{url}'. Those commands include URLs and other strings derived from parsed page snapshots. If an attacker can influence snapshot output (or if a malformed URL contains shell metacharacters), this can lead to command injection. Consider running the tool only in an isolated environment (VM/container) and inspect/clean inputs first.
- If you plan to use this skill: (1) Review and/or modify fetch-hot-search.py and fetch-topic-content to avoid shell=True and to pass command arguments as lists to subprocess.run; (2) sanitize or validate any URL or title data before embedding into shell commands; (3) explicitly document and verify the 'openclaw' dependency in SKILL.md; (4) run initial tests with a non-privileged user and in a sandbox; (5) consider adding explicit warnings in SKILL.md about the browser dependency and security considerations.
If you want higher assurance, ask the author to declare the 'openclaw' dependency and to replace shell invocations with safe APIs or subprocess argument lists and to add input sanitization. Without those changes, treat the skill as potentially risky on sensitive hosts.Like a lobster shell, security has layers — review code before you run it.
Current versionv1.0.0
Download ziplatest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
Runtime requirements
📱 Clawdis
SKILL.md
Weibo Hot Search - 微博热搜数据采集 | Weibo Hot Search Data Collection
多频道微博热搜数据采集工具,支持数据持久化存储和可视化展示。 Multi-channel Weibo hot search data collection tool with persistence and visualization.
功能特性 | Features
- 多频道采集 / Multi-Channel Collection - 同时抓取热搜总榜、社会榜、文娱榜、生活榜 | Fetch hot search, social, entertainment, life channels simultaneously
- 数据持久化 / Data Persistence - 自动保存到SQLite数据库,支持历史查询 | Auto-save to SQLite database with historical query support
- HTML可视化 / HTML Visualization - 生成交互式报告,支持日期/频道/关键词筛选 | Generate interactive reports with date/channel/keyword filters
- 频道标签 / Channel Tags - 热/新/商/官宣等标签识别 | Hot/New/Commercial/Official tag recognition
快速开始 | Quick Start
1. 初始化数据库 | Initialize Database
cd scripts
python3 init_db.py
2. 采集数据 | Collect Data
# 采集所有频道(每频道30条)/ Fetch all channels (30 per channel)
python3 save_to_db.py
# 指定数量 / Specify count
python3 save_to_db.py 50
3. 查询数据 | Query Data
# 查看今天的热搜 / View today's hot search
python3 query.py today
# 查看指定频道 / View specific channel
python3 query.py today hot
# 查看指定日期 / View specific date
python3 query.py date 2026-03-15
# 查看统计 / View statistics
python3 query.py stats 7
4. 生成HTML报告 | Generate HTML Report
python3 generate_html.py
open ../data/index.html
文件结构 | File Structure
weibo-fresh-posts-0/
├── SKILL.md # 本文档 | This document
├── data/
│ ├── weibo.db # SQLite数据库 | SQLite database
│ └── index.html # HTML可视化报告 | HTML visualization report
└── scripts/
├── init_db.py # 数据库初始化 | DB initialization
├── db.py # 数据库操作模块 | DB operations module
├── fetch-hot-search.py # 核心采集脚本 | Core collection script
├── save_to_db.py # 采集并保存到数据库 | Collection & save
├── query.py # 数据查询工具 | Data query tool
└── generate_html.py # HTML报告生成 | HTML report generation
数据库结构 | Database Schema
hot_items 表 | hot_items Table
CREATE TABLE hot_items (
id TEXT PRIMARY KEY, -- URL+日期+频道的哈希 | Hash of URL+date+channel
platform TEXT DEFAULT 'weibo', -- 平台标识 | Platform identifier
channel_id TEXT, -- hot/social/entertainment/life
channel_name TEXT, -- 频道名称 | Channel name
rank INTEGER, -- 排名 | Ranking
title TEXT NOT NULL, -- 标题 | Title
url TEXT NOT NULL, -- 链接 | Link
heat INTEGER, -- 热度值 | Heat value
tag TEXT, -- 热/新/商/官宣等 | Hot/New/Commercial/Official
fetched_at INTEGER, -- 抓取时间 | Fetch time
fetch_date TEXT -- 抓取日期 YYYY-MM-DD | Fetch date
);
topic_posts 表 | topic_posts Table
CREATE TABLE topic_posts (
id INTEGER PRIMARY KEY AUTOINCREMENT,
hot_item_id TEXT, -- 关联的热搜条目 | Related hot item
author TEXT, -- 作者 | Author
author_type TEXT, -- media/user | Media or user
content TEXT, -- 内容 | Content
url TEXT, -- 链接 | Link
is_media BOOLEAN -- 是否媒体账号 | Is media account
);
使用示例 | Usage Examples
采集数据 | Collect Data
# 基础采集 / Basic collection
python3 scripts/save_to_db.py
# 显示输出 / Sample output:
# ============================================================
# 📱 微博热搜采集 → 数据库 / Weibo Hot Search → Database
# 每频道采集数量 / Per channel count: 30
# # ============================================================
# 🔥 开始采集微博热搜 / Starting Weibo hot search collection...
#
# 📡 [热搜总榜 / Hot Search]
# ✅ 30 条热搜 / 30 hot items
# 新增 / New: 30 条
#
# 📡 [社会榜 / Social]
# ✅ 30 条热搜 / 30 hot items
# 新增 / New: 30 条
# ...
查询数据 | Query Data
# 今天的热搜 / Today's hot search
$ python3 query.py today
📱 微博热搜 - 2026-03-16 / Weibo Hot Search
================================================================================
【热搜总榜 / Hot Search】
1. 315晚会曝光... [热] / 315 Gala exposure... [Hot]
🔥 5,000,000
2. 明星离婚... [爆] / Celebrity divorce... [Viral]
🔥 3,200,000
...
【社会榜 / Social】
1. 交通事故... / Traffic accident...
2. 天气预报... / Weather forecast...
生成报告 | Generate Report
$ python3 generate_html.py
✅ HTML报告已生成 / HTML report generated: data/index.html
共 / Total: 120 条记录
日期范围 / Date range: 2026-03-15 ~ 2026-03-16
频道 / Channels: 热搜总榜, 社会榜, 文娱榜, 生活榜
打开方式 / Open methods:
- Mac: open data/index.html
HTML报告功能 | HTML Report Features
- 📅 日期筛选 / Date Filter - 选择具体日期 | Select specific date
- 📺 频道筛选 / Channel Filter - 点击频道标签过滤 | Click channel tags to filter
- 🔍 关键词搜索 / Keyword Search - 实时搜索标题 | Real-time title search
- 🔥 热度显示 / Heat Display - 显示热度值 | Show heat values
- 🏷️ 标签展示 / Tag Display - 热/新/商/官宣等标签 | Hot/New/Commercial/Official tags
- 🏆 排名标识 / Ranking Display - Top 3 特殊颜色标识 | Top 3 special color marking
支持频道 | Supported Channels
| 频道ID / Channel ID | 频道名称 / Channel Name | 说明 / Description |
|---|---|---|
| hot | 热搜总榜 / Hot Search | 综合热搜 / Comprehensive hot |
| social | 社会榜 / Social | 社会新闻 / Social news |
| entertainment | 文娱榜 / Entertainment | 娱乐文化 / Entertainment & culture |
| life | 生活榜 / Life | 生活方式 / Lifestyle |
原始采集脚本 | Original Collection Script
如需直接获取JSON数据: For direct JSON output:
# 输出到文件 / Output to file
python3 fetch-hot-search.py -o weibo-hot.json
# 输出到stdout(静默模式)/ Output to stdout (quiet mode)
python3 fetch-hot-search.py -q
# 抓取详细内容(前10个话题的帖子)/ Fetch detailed content (posts for top 10 topics)
python3 fetch-hot-search.py -c --content-limit 2 -o weibo-hot.json
数据查询SQL示例 | SQL Query Examples
# 进入数据库 / Enter database
sqlite3 data/weibo.db
# 今天的热搜总榜 / Today's hot search
SELECT rank, title, heat, tag FROM hot_items
WHERE fetch_date = date('now') AND channel_id = 'hot'
ORDER BY rank LIMIT 10;
# 最近7天每天各频道数量 / Daily channel counts for last 7 days
SELECT fetch_date, channel_name, COUNT(*)
FROM hot_items
WHERE fetch_date >= date('now', '-7 days')
GROUP BY fetch_date, channel_id;
# 包含"315"的热搜 / Hot search containing "315"
SELECT * FROM hot_items
WHERE title LIKE '%315%'
ORDER BY fetch_date DESC, heat DESC;
注意事项 | Notes
- 需要登录 / Login Required - 使用 browser open https://weibo.com 登录 | Use browser open https://weibo.com to login
- 频率限制 / Rate Limiting - 每次抓取有短暂延迟,避免触发反爬 | Brief delay between fetches to avoid anti-crawl
- 数据去重 / Deduplication - 同一天同一URL同一频道只保存一次 | Same URL on same day/channel saved once only
- 热度更新 / Heat Update - 重新抓取会更新热度值 | Refetching updates heat values
更新记录 | Changelog
- 2026-03-16: 添加数据库持久化和HTML可视化功能 / Added database persistence and HTML visualization
Files
9 totalSelect a file
Select a file to preview.
Comments
Loading comments…
