Xiaohongshu Crawler
小红书内容爬取工具,支持搜索笔记(需要登录)、获取笔记详情、用户信息、热门笔记等公开内容爬取功能。
MIT-0 · Free to use, modify, and redistribute. No attribution required.
⭐ 0 · 34 · 1 current installs · 1 all-time installs
by@Djttt
MIT-0
Security Scan
OpenClaw
Benign
medium confidencePurpose & Capability
Name/description (Xiaohongshu crawler) align with the included scripts and libraries: Playwright-based browser automation, search/deep-crawl/get-note/get-user/hot-notes scripts, anti-crawl logic and caching. Dependencies (playwright, axios, cheerio) and the presence of stealth/anti-crawl code are expected for this purpose.
Instruction Scope
SKILL.md and the scripts instruct the agent/user to open a browser, capture session cookies, and write them into config.json; scripts read config.json, use cookies to access logged-in-only content, and write output files and caches. Collecting and persisting session cookies is sensitive and outside 'purely read-only' behavior—it's necessary for logged-in scraping but should be highlighted to users.
Install Mechanism
There is no explicit install spec in the registry entry (instruction-only), but package.json lists Playwright which will pull browser binaries and increase disk/network activity when dependencies are installed. No downloads from unknown ad-hoc URLs or obfuscated installers are present in the provided files.
Credentials
The skill requests no environment variables or external credentials, which is coherent. However, it asks the user to export and store their Xiaohongshu session cookies (and config may contain proxy server credentials) into a local config.json in plaintext; that is sensitive and should be treated as such. The number/nature of requested items is proportionate to a scraper but still security-relevant.
Persistence & Privilege
The skill is not marked always:true and does not modify other skills or global agent settings. It writes/reads its own config.json and cache files in its workspace, which is normal for a CLI scraper.
Assessment
This skill appears to be what it says: a Playwright-based Xiaohongshu scraper. Important cautions before installing/running: 1) get-cookie.js extracts your account session cookies and saves them to config.json in plaintext—do not use your primary/personal account if you are concerned about compromise or violating platform rules; prefer a throwaway/test account. 2) Review config.json before running — it can contain proxy credentials and stored cookies. 3) Installing dependencies (playwright) will download browser binaries; run npm install in a sandboxed environment if you want to limit exposure. 4) The tool includes anti-detection and proxy-rotation features; aggressive or large-scale crawling can violate site terms and may lead to account suspension—follow the usage limits documented in SKILL.md. 5) If you need higher assurance, request the missing files not shown here (4 files were truncated) and confirm there is no hidden network exfiltration before running on sensitive accounts.Like a lobster shell, security has layers — review code before you run it.
Current versionv1.0.1
Download ziplatest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
SKILL.md
Xiaohongshu Crawler
小红书(Xiaohongshu)内容爬取工具,支持搜索、笔记详情、用户信息等公开内容的获取。
📋 描述
小红书内容爬取工具,支持搜索笔记(需要登录)、获取笔记详情、用户信息、热门笔记等公开内容爬取功能。
使用场景:
- 搜索特定关键词的笔记
- 获取单条笔记的详细内容
- 获取用户公开信息
- 获取热门笔记列表
- 批量深度爬取并生成分析报告
注意: 本工具仅限学习和研究使用,必须遵守小红书用户协议和相关法律法规。
🚀 安装
clawhub install xiaohongshu-crawler
⚙️ 快速配置
1. 获取 Cookie(搜索功能必需)
node scripts/get-cookie.js
按提示扫码登录后输入 save 即可。
2. 测试 Cookie
node scripts/test-cookie.js
显示 "✅ Cookie 有效" 即可使用。
📝 核心用法
快速搜索
node scripts/quick-search.js "关键词" [数量]
深度爬取
node scripts/deep-crawl.js "关键词" [数量]
生成详细内容和 Markdown 分析报告。
其他功能
node scripts/get-note.js "笔记 ID" # 获取笔记详情
node scripts/get-user.js "用户 ID" # 获取用户信息
node scripts/hot-notes.js # 获取热门笔记
📚 详细文档
- 完整使用指南 →
references/USAGE.md - 使用示例 →
references/examples.md - 故障排查 →
references/TROUBLESHOOTING.md
🛠️ 脚本说明
| 脚本 | 功能 | 需要登录 |
|---|---|---|
get-cookie.js | 交互式获取 Cookie | - |
test-cookie.js | 测试 Cookie 有效性 | - |
quick-search.js | 快速搜索笔记 | ✅ |
deep-crawl.js | 深度爬取笔记详情 | ✅ |
get-note.js | 获取单条笔记详情 | ✅ |
get-user.js | 获取用户信息 | ✅ |
hot-notes.js | 获取热门笔记 | 可选 |
⚠️ 使用规范
合规使用
- ✅ 允许: 个人学习研究、公开内容爬取、小批量数据(<50 条/次)
- ❌ 禁止: 商业用途、大规模高频爬取、私人内容、绕过付费、分发数据
反爬保护
- 默认随机延迟 2-8 秒
- 每分钟最多 10 个请求
- 模拟人类浏览行为
详细配置和故障排查请查看 references/ 目录下的文档。
Files
19 totalSelect a file
Select a file to preview.
Comments
Loading comments…
