小红书舆情爬虫

爬取小红书笔记和评论内容,支持关键词搜索和基础舆情分析,需本地登录小红书账号运行。

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 26 · 0 current installs · 0 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (XHS crawler + sentiment analysis) match the assets: SKILL.md explains browser automation, local login, or using TikHub; included xhs_spider.py calls a Xiaohongshu search endpoint. The required capabilities (browser, login) described are consistent with the stated purpose.
Instruction Scope
SKILL.md tells the user to clone upstream projects, create a venv, install Python deps and run the crawler locally, and to log in via browser. The instructions do not ask the agent to read unrelated system files or to exfiltrate data to unknown endpoints. Note: browser automation implies cookies/session data are used locally — review upstream code before running.
Install Mechanism
No formal install spec in the registry (instruction-only). SKILL.md instructs cloning public GitHub repos and pip installing requirements.txt — a common pattern. Because the runtime relies on external GitHub projects, the user should review those upstream repositories before running to ensure no unexpected code is pulled.
Credentials
The skill declares no environment variables or credentials. It does require a local Xiaohongshu account login (handled interactively via browser), which is appropriate for scraping. TikHub is mentioned as an optional paid alternative that would require a token; that is documented and optional.
Persistence & Privilege
Skill does not request persistent 'always' inclusion and does not modify other skills or system-wide agent settings. It runs only when invoked by the user/agent.
Assessment
This skill appears to do what it claims, but take these precautions before running: (1) Review the upstream GitHub projects (redbooks, jiang-xiaohongshu-crawler, TikHub SDK) because SKILL.md instructs cloning them — external repos can contain unexpected code. (2) Run the crawler only on a local machine you control and consider using a throwaway/testing account, since automated login and scraping can expose session cookies and may lead to account suspension under the platform's terms. (3) Be mindful of rate limits and set delays (SKILL.md suggests 3–5s). (4) If you consider using TikHub or other third-party APIs, only provide tokens to trusted services and verify their privacy/usage policies. (5) If you need higher assurance, inspect crawler_ultimate.py and the requirements for any native binaries or complex browser automation code before running.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk975wybb2c9evvy1d5x2rcjhj983142t

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

小红书舆情爬虫技能

爬取小红书笔记内容,分析舆情

概述

本技能用于爬取小红书(XHS)上的笔记内容,支持关键词搜索、评论采集和简单的舆情分析。

项目来源

基于 GitHub 开源项目:

环境要求

本地运行 (Windows)

# Python 3.8+
pip install -r requirements.txt
python crawler_ultimate.py

依赖

  • DrissionPage (浏览器自动化)
  • pandas (数据处理)
  • openpyxl (Excel导出)
  • requests (HTTP请求)
  • Pillow (图片处理)
  • customtkinter (GUI界面)

当前状态

❌ 环境限制

经过测试发现:

  1. 小红书 API 需要登录凭证 - 直接调用接口返回 -101 无登录信息
  2. 无浏览器环境 - 当前沙箱没有 Chromium/Chrome 浏览器
  3. 无 GUI 环境 - redbooks 需要 Windows + customtkinter

✅ 可行方案

  1. 本地运行 - 在有浏览器和账号的本地电脑运行
  2. TikHub API - 使用付费 API 服务 (需要注册获取 token)

快速开始 (本地环境)

1. 安装依赖

# 克隆项目
git clone https://github.com/xiaofuqing13/redbooks.git
cd redbooks

# 创建虚拟环境 (推荐)
python -m venv venv
# Windows:
venv\Scripts\activate
# Linux/Mac:
source venv/bin/activate

# 安装依赖
pip install -r requirements.txt

2. 运行程序

python crawler_ultimate.py

3. 登录小红书

首次运行会弹出浏览器窗口,用小红书账号登录

4. 开始爬取

  • 输入关键词: 摩比爱识字摩比点读笔
  • 选择爬取模式: 标准模式
  • 设置爬取数量: 50-100 条

技术限制说明

方式需要登录需要浏览器费用
网页 API免费但有限制
浏览器自动化免费
TikHub API付费

更新日期: 2026-03-15

功能特性

功能redbooksjiang-xiaohongshu-crawlerTikHub API
关键词搜索
笔记内容
评论采集
图片下载
视频下载
AI舆情分析
GUI界面
免费付费

爬取字段

  • title (标题)
  • author (作者)
  • content (正文内容)
  • tags (标签)
  • publish_time (发布时间)
  • ip_region (IP地区)
  • like_count (点赞数)
  • collect_count (收藏数)
  • comment_count (评论数)
  • comments (评论列表)

注意事项

  1. 遵守平台规则: 仅供学习研究使用
  2. 登录要求: 首次使用需要登录小红书账号
  3. 频率限制: 建议设置爬取间隔 (如 3-5秒)
  4. 反爬风险: 频繁爬取可能导致账号被封禁

摩比产品搜索关键词

  • 摩比爱识字
  • 摩比点读笔
  • 摩比思维机
  • mobby
  • 摩比英语
  • 摩比中文

输出示例

爬取结果会保存为:

  • Excel 文件: data/关键词_时间.xlsx
  • SQLite 数据库: data/xiaohongshu.db
  • 图片/视频: images/关键词_时间/

更新日期: 2026-03-15

Files

2 total
Select a file
Select a file to preview.

Comments

Loading comments…