Douyin Scraper (NL Search)

PassAudited by VirusTotal on May 11, 2026.

Overview

Type: OpenClaw Skill Name: douyin-scraper-nl Version: 3.0.0 The skill is a Douyin scraper that uses Playwright for automation and a third-party Baidu AI Studio endpoint (aistudio-app.com) for OCR. It is classified as suspicious due to a path traversal vulnerability in `scripts/full_workflow.py`, where the user-provided `keyword` is used to construct output file paths without sanitization. Additionally, the instructions in `SKILL.md` for the agent to read and summarize OCR-extracted text from external sources create a risk of indirect prompt injection, where malicious content on a scraped page could influence the agent's behavior.

Findings (0)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

Using this skill could cause account, IP, or platform-policy consequences and may bypass safeguards that the site uses to limit automated scraping.

Why it was flagged

The skill explicitly says it bypasses anti-scraping protections and uses anti-detection behavior to avoid CAPTCHA triggers while scraping Douyin.

Skill content
Playwright 截图(绕过反爬虫) ... 反检测:有头浏览器(headless=False)+ 拟人操作节奏,避免触发验证码
Recommendation

Only use it where you have permission to automate collection, keep collection volumes low, and do not use it to evade platform restrictions or bot protections.

What this means

Anyone with access to the skill directory may be able to reuse the saved Douyin session, and automated scraping will run under the user's logged-in account context.

Why it was flagged

The login flow stores a persistent browser profile for Douyin in the skill directory so later runs can use the logged-in session.

Skill content
PROFILE_DIR = Path(__file__).parent.parent / "profile" ... launch_persistent_context(user_data_dir=str(PROFILE_DIR), headless=False, ...)
Recommendation

Use a dedicated account if possible, restrict filesystem access to the skill directory, and clear the profile when finished using the provided clear option or by deleting the profile directory.

What this means

Images captured from Douyin notes and the OCR token are transmitted to the OCR provider endpoint for processing.

Why it was flagged

The OCR step sends base64-encoded screenshot data and the configured OCR token to an external OCR API endpoint.

Skill content
OCR_API_URL = os.getenv("BAIDU_PADDLEOCR_API_URL", "https://r41cd0p9x7dfp1s7.aistudio-app.com/layout-parsing") ... headers = {"Authorization": f"token {OCR_TOKEN}" ...} ... requests.post(OCR_API_URL, json=payload, headers=headers, timeout=30)
Recommendation

Confirm the OCR endpoint is the provider you intend to use, avoid scraping private or sensitive content, and use a token with the minimum needed scope.

What this means

Dependency changes or compromised packages could affect the local environment where the skill runs.

Why it was flagged

The setup instructions ask the user to install unpinned Python packages and a browser runtime; this is expected for Playwright automation but still depends on external package provenance.

Skill content
pip install playwright requests python-dotenv
python -m playwright install chromium
Recommendation

Install in an isolated virtual environment and pin or review package versions if using this skill in a sensitive environment.