Douyin Scraper (NL Search)

SuspiciousAudited by ClawScan on May 10, 2026.

Overview

This skill is a disclosed Douyin scraping tool, but it explicitly uses logged-in persistent browser automation and anti-bot/CAPTCHA-avoidance techniques.

Review carefully before installing. This is a scraper that relies on a logged-in Douyin browser session and explicit anti-bot evasion. Use it only with permission, protect or clear the saved profile, and confirm you are comfortable sending captured images to the configured OCR endpoint.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

ConcernHigh Confidence

ASI02: Tool Misuse and Exploitation

What this means

Using this skill could cause account, IP, or platform-policy consequences and may bypass safeguards that the site uses to limit automated scraping.

Why it was flagged

The skill explicitly says it bypasses anti-scraping protections and uses anti-detection behavior to avoid CAPTCHA triggers while scraping Douyin.

Skill content

Playwright 截图（绕过反爬虫） ... 反检测：有头浏览器（headless=False）+ 拟人操作节奏，避免触发验证码

Recommendation

Only use it where you have permission to automate collection, keep collection volumes low, and do not use it to evade platform restrictions or bot protections.

NoteHigh Confidence

ASI03: Identity and Privilege Abuse

What this means

Anyone with access to the skill directory may be able to reuse the saved Douyin session, and automated scraping will run under the user's logged-in account context.

Why it was flagged

The login flow stores a persistent browser profile for Douyin in the skill directory so later runs can use the logged-in session.

Skill content

PROFILE_DIR = Path(__file__).parent.parent / "profile" ... launch_persistent_context(user_data_dir=str(PROFILE_DIR), headless=False, ...)

Recommendation

Use a dedicated account if possible, restrict filesystem access to the skill directory, and clear the profile when finished using the provided clear option or by deleting the profile directory.

NoteHigh Confidence

ASI07: Insecure Inter-Agent Communication

What this means

Images captured from Douyin notes and the OCR token are transmitted to the OCR provider endpoint for processing.

Why it was flagged

The OCR step sends base64-encoded screenshot data and the configured OCR token to an external OCR API endpoint.

Skill content

OCR_API_URL = os.getenv("BAIDU_PADDLEOCR_API_URL", "https://r41cd0p9x7dfp1s7.aistudio-app.com/layout-parsing") ... headers = {"Authorization": f"token {OCR_TOKEN}" ...} ... requests.post(OCR_API_URL, json=payload, headers=headers, timeout=30)

Recommendation

Confirm the OCR endpoint is the provider you intend to use, avoid scraping private or sensitive content, and use a token with the minimum needed scope.

NoteHigh Confidence

ASI04: Agentic Supply Chain Vulnerabilities

What this means

Dependency changes or compromised packages could affect the local environment where the skill runs.

Why it was flagged

The setup instructions ask the user to install unpinned Python packages and a browser runtime; this is expected for Playwright automation but still depends on external package provenance.

Skill content

pip install playwright requests python-dotenv
python -m playwright install chromium

Recommendation

Install in an isolated virtual environment and pin or review package versions if using this skill in a sensitive environment.