Install
openclaw skills install douyin-report-searchThis skill automates end-to-end Douyin topic research and report generation. Given a search keyword and a target video count, it handles QR-code login, batch...
openclaw skills install douyin-report-searchAutomate the full pipeline: keyword → data collection → CAPTCHA bypass → enrichment → analysis → HTML report, replicating a proven workflow that successfully collected and analyzed 100 videos on the topic "女性成长".
| Parameter | Default | Notes |
|---|---|---|
KEYWORD | 女性成长 | Search keyword (URL-encoded automatically) |
TOTAL | 100 | Total videos to collect |
DETAIL_LIMIT | 50 | Max videos to visit detail pages |
COMMENTS_TOP | 5 | Top comments per video |
cd <work_dir>
python3 -m venv venv && source venv/bin/activate
pip install playwright pillow numpy scipy scikit-image openpyxl
playwright install chromium
Run scripts/douyin_login.py (or adapt inline). The script:
https://www.douyin.comdocument.cookie until login detected)douyin_session.jsonKey anti-detection settings (always apply):
args=["--disable-blink-features=AutomationControlled", "--no-sandbox",
"--window-size=1440,900"]
# Init script:
"Object.defineProperty(navigator,'webdriver',{get:()=>undefined});"
See scripts/collect_videos.py. Core logic:
search/item API response (aweme_list field contains video data)https://www.douyin.com/search/{keyword}?type=videowindow.scrollBy(0, 600) then wait 4saweme_id, desc (title), statistics (likes/shares/collects/comments), author.uid, author.nickname, author.follower_count, video.duration, text_extra (tags)See scripts/parse_videos.py and scripts/captcha_solver.py.
The algorithm is embedded in scripts/captcha_solver.py. Key findings from empirical testing:
# Element selectors (抖音 captcha iframe)
captcha_frame selector: frame.url contains "verifycenter" or "captcha"
bg_el = frame.locator(".captcha-verify-image").first
sl_el = frame.locator(".captcha-verify-image-slide").first
btn_el = frame.locator(".captcha-slider-btn").first
# Slide distance formula
gap_center_abs = bg_bb["x"] + gap_x + sl_bb["width"] / 2
btn_center_abs = btn_bb["x"] + btn_bb["width"] / 2
slide_distance = gap_center_abs - btn_center_abs
def ease_out_cubic(t): return 1 - (1 - t) ** 3
# overshoot 3-7px, then pull back in final 15% of path
# Y-axis jitter ±2px, X-axis jitter ±1px during 5%-80%
# Timing: fast phase (frac<0.5) 5-8ms, mid 10-18ms, slow 25-45ms
rb = frame.locator(".vc-captcha-refresh,.captcha-refresh,[class*='refresh']").first
See scripts/analyze_factors.py and scripts/generate_report.py.
Analysis dimensions (all proven to have measurable effect):
| Dimension | Key Finding |
|---|---|
| Duration | 2-3 min is sweet spot (15× better than >5 min) |
| Tag count | 1-2 tags >> 5+ tags (up to 6× difference) |
| Best tags | #自我成长 #个人成长 #认知 #女生必看 |
| Follower (log-corr) | r=0.617, moderate positive |
Title with ! | +2× likes vs no exclamation |
| Title length | 11-20 chars optimal |
| Emotion keywords | Love/marriage/mood words → higher shares |
Report output: douyin_analysis_report.html with 10 interactive Chart.js charts.
work_dir/
├── douyin_session.json # saved login cookies
├── douyin_raw_data.json # raw collected videos
├── douyin_parsed.json # enriched with detail data
├── analysis_result.json # computed analysis metrics
├── douyin_report.xlsx # Excel version
└── douyin_analysis_report.html # final interactive HTML report
bg_arr[:mask_h, :mask_w] = column_mean_fillsearch_start = sl_w + 12 to skip the initial slider position areadouyin_session.json expires; re-login if 401/redirect to login pageplaywright, pillow, numpy, scipy, scikit-image, openpyxl