---
name: douyin-scraper
description: 爬取抖音视频和文案数据，支持关键词搜索、热榜获取。Agent 可用自然语言触发搜索，skill 会自动映射到对应命令。
---

# 抖音爆款爬虫 Skill

## 自然语言触发（Agent 用法）

当用户用自然语言请求抖音相关操作时，按以下规则映射：

| 用户意图 | 示例 | 执行命令 |
|---|---|---|
| 搜索视频/内容 | "搜索一下海鲜视频"、"找找小龙虾的视频"、"帮我搜抖音上的美食" | `python scripts/scraper.py search --keyword "海鲜" --limit 10` |
| 看热榜 | "看看抖音热榜"、"抖音有什么热门" | `python scripts/scraper.py hot --limit 20` |
| 分类热榜 | "美食热榜"、"看看抖音热门音乐" | `python scripts/scraper.py hot --category "美食" --limit 20` |
| 搜索并保存 | "搜海鲜视频保存到文件" | `python scripts/scraper.py search --keyword "海鲜" --limit 10 --output seafood.json` |

**关键词提取规则：**
- 从用户自然语言中提取核心搜索词，去掉"视频"、"内容"等冗余词
- "搜索一下海鲜视频" → keyword="海鲜"
- "找找海鲜售卖相关的" → keyword="海鲜售卖"
- "帮我搜抖音上做小龙虾的" → keyword="小龙虾"

## 命令行用法

```bash
# 搜索
python scripts/scraper.py search --keyword "海鲜" --limit 10

# 热榜
python scripts/scraper.py hot --limit 20

# 搜索并保存
python scripts/scraper.py search --keyword "海鲜" --limit 10 --output result.json
python scripts/scraper.py search --keyword "海鲜" --limit 10 --output result.csv --format csv
```

## 输出格式

命令输出 JSON 数组到 stdout，每条记录：

```json
{
  "title": "视频标题",
  "description": "视频描述",
  "author": "作者",
  "play_count": 100000,
  "like_count": 5000,
  "comment_count": 200,
  "share_count": 100,
  "url": "https://www.douyin.com/video/xxx",
  "tags": ["海鲜", "热门"],
  "publish_time": "2026-05-20"
}
```

## 依赖

- Python 3.10+
- playwright (`pip install playwright`)
- Chromium 浏览器 (`playwright install chromium`)

## 注意事项

- 遵守抖音平台规则，避免频繁请求
- 默认请求间隔 2 秒
- 仅供学习研究使用