Douyin Scraper V2

WarnAudited by ClawScan on May 10, 2026.

Overview

This is a disclosed Douyin scraping skill, but it intentionally uses anti-detection/browser-session techniques to bypass platform protections and should be reviewed carefully before use.

Install only if you deliberately want a logged-in Douyin scraping tool with anti-detection behavior. Use an isolated environment and preferably a separate Douyin account, review the full script and dependencies, verify the OCR endpoint/token, consider `--no-ocr` for privacy, and clear the saved browser profile when finished.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

Using this skill may put the user's Douyin account or environment at risk of platform enforcement, and it directs the agent to perform stealth scraping rather than ordinary browsing.

Why it was flagged

The skill explicitly instructs browser automation designed to bypass anti-scraping controls and avoid CAPTCHA/bot detection.

Skill content
Playwright 截图(绕过反爬虫)... 反检测:有头浏览器(headless=False)+ 拟人操作节奏,避免触发验证码
Recommendation

Only use it if you intentionally want this behavior; prefer an isolated environment and account, and require explicit user approval before each scraping run.

What this means

The skill can browse Douyin as the logged-in user, and the local profile directory contains account session material.

Why it was flagged

The login script stores a persistent Douyin browser profile, meaning cookies/session state remain on disk and are reused by later automation.

Skill content
ctx = p.chromium.launch_persistent_context(user_data_dir=str(PROFILE_DIR), headless=False, ...); page.goto("https://www.douyin.com"
Recommendation

Use a separate account if possible, protect the profile directory, and clear the saved profile when no longer needed.

What this means

Screenshots collected from Douyin are sent to the OCR provider for processing; this is expected for OCR but crosses a data boundary.

Why it was flagged

Captured images are base64-encoded and uploaded to an external OCR endpoint with the configured OCR token.

Skill content
OCR_API_URL = os.getenv("BAIDU_PADDLEOCR_API_URL", "https://r41cd0p9x7dfp1s7.aistudio-app.com/layout-parsing") ... requests.post(OCR_API_URL, json=payload, headers=headers, timeout=30)
Recommendation

Use `--no-ocr` if you do not want image contents sent to the OCR service, and verify the OCR endpoint and token before use.

What this means

Installing dependencies may download and run third-party code outside the registry's install specification.

Why it was flagged

The skill relies on manual installation of unpinned Python packages and a browser runtime, which is normal for Playwright but should be treated as supply-chain exposure.

Skill content
pip install playwright requests python-dotenv
python -m playwright install chromium
Recommendation

Install from trusted package indexes, consider pinning versions, and review dependencies before running the workflow.