Douyin Scraper V2
WarnAudited by ClawScan on May 10, 2026.
Overview
This is a disclosed Douyin scraping skill, but it intentionally uses anti-detection/browser-session techniques to bypass platform protections and should be reviewed carefully before use.
Install only if you deliberately want a logged-in Douyin scraping tool with anti-detection behavior. Use an isolated environment and preferably a separate Douyin account, review the full script and dependencies, verify the OCR endpoint/token, consider `--no-ocr` for privacy, and clear the saved browser profile when finished.
Findings (4)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Using this skill may put the user's Douyin account or environment at risk of platform enforcement, and it directs the agent to perform stealth scraping rather than ordinary browsing.
The skill explicitly instructs browser automation designed to bypass anti-scraping controls and avoid CAPTCHA/bot detection.
Playwright 截图(绕过反爬虫)... 反检测:有头浏览器(headless=False)+ 拟人操作节奏,避免触发验证码
Only use it if you intentionally want this behavior; prefer an isolated environment and account, and require explicit user approval before each scraping run.
The skill can browse Douyin as the logged-in user, and the local profile directory contains account session material.
The login script stores a persistent Douyin browser profile, meaning cookies/session state remain on disk and are reused by later automation.
ctx = p.chromium.launch_persistent_context(user_data_dir=str(PROFILE_DIR), headless=False, ...); page.goto("https://www.douyin.com"Use a separate account if possible, protect the profile directory, and clear the saved profile when no longer needed.
Screenshots collected from Douyin are sent to the OCR provider for processing; this is expected for OCR but crosses a data boundary.
Captured images are base64-encoded and uploaded to an external OCR endpoint with the configured OCR token.
OCR_API_URL = os.getenv("BAIDU_PADDLEOCR_API_URL", "https://r41cd0p9x7dfp1s7.aistudio-app.com/layout-parsing") ... requests.post(OCR_API_URL, json=payload, headers=headers, timeout=30)Use `--no-ocr` if you do not want image contents sent to the OCR service, and verify the OCR endpoint and token before use.
Installing dependencies may download and run third-party code outside the registry's install specification.
The skill relies on manual installation of unpinned Python packages and a browser runtime, which is normal for Playwright but should be treated as supply-chain exposure.
pip install playwright requests python-dotenv python -m playwright install chromium
Install from trusted package indexes, consider pinning versions, and review dependencies before running the workflow.
