Douyin Scraper (NL Search)
SuspiciousAudited by ClawScan on May 10, 2026.
Overview
This skill is a disclosed Douyin scraping tool, but it explicitly uses logged-in persistent browser automation and anti-bot/CAPTCHA-avoidance techniques.
Review carefully before installing. This is a scraper that relies on a logged-in Douyin browser session and explicit anti-bot evasion. Use it only with permission, protect or clear the saved profile, and confirm you are comfortable sending captured images to the configured OCR endpoint.
Findings (4)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Using this skill could cause account, IP, or platform-policy consequences and may bypass safeguards that the site uses to limit automated scraping.
The skill explicitly says it bypasses anti-scraping protections and uses anti-detection behavior to avoid CAPTCHA triggers while scraping Douyin.
Playwright 截图(绕过反爬虫) ... 反检测:有头浏览器(headless=False)+ 拟人操作节奏,避免触发验证码
Only use it where you have permission to automate collection, keep collection volumes low, and do not use it to evade platform restrictions or bot protections.
Anyone with access to the skill directory may be able to reuse the saved Douyin session, and automated scraping will run under the user's logged-in account context.
The login flow stores a persistent browser profile for Douyin in the skill directory so later runs can use the logged-in session.
PROFILE_DIR = Path(__file__).parent.parent / "profile" ... launch_persistent_context(user_data_dir=str(PROFILE_DIR), headless=False, ...)
Use a dedicated account if possible, restrict filesystem access to the skill directory, and clear the profile when finished using the provided clear option or by deleting the profile directory.
Images captured from Douyin notes and the OCR token are transmitted to the OCR provider endpoint for processing.
The OCR step sends base64-encoded screenshot data and the configured OCR token to an external OCR API endpoint.
OCR_API_URL = os.getenv("BAIDU_PADDLEOCR_API_URL", "https://r41cd0p9x7dfp1s7.aistudio-app.com/layout-parsing") ... headers = {"Authorization": f"token {OCR_TOKEN}" ...} ... requests.post(OCR_API_URL, json=payload, headers=headers, timeout=30)Confirm the OCR endpoint is the provider you intend to use, avoid scraping private or sensitive content, and use a token with the minimum needed scope.
Dependency changes or compromised packages could affect the local environment where the skill runs.
The setup instructions ask the user to install unpinned Python packages and a browser runtime; this is expected for Playwright automation but still depends on external package provenance.
pip install playwright requests python-dotenv python -m playwright install chromium
Install in an isolated virtual environment and pin or review package versions if using this skill in a sensitive environment.
