Skill flagged — suspicious patterns detected
ClawHub Security flagged this skill as suspicious. Review the scan results before using.
Douyin Content Tracker Skill
v1.0.0This skill should be used when the user wants to scrape Douyin (TikTok China) creator content, download audio, and transcribe it with Whisper. Covers first-t...
⭐ 0· 58·0 current·0 all-time
byyibo@gpttang
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
OpenClaw
Suspicious
medium confidencePurpose & Capability
The skill's name/description (scrape Douyin, extract audio, transcribe with Whisper) matches the included scripts and pipeline. It legitimately needs Playwright, ffmpeg access, MediaCrawler, and Whisper model downloads. However, the registry metadata declares no required environment variables while the SKILL.md and code require/expect a MEDIACRAWLER_DIR value in .env — an inconsistency between claimed requirements and actual needs.
Instruction Scope
SKILL.md instructs the agent/user to clone and run MediaCrawler, run Playwright browser installs, create/modify a .env, run scripts that open a browser for QR login (producing a .douyin_cookies.json), and run local pipeline scripts that read/write many files. The scripts also modify an external project's config file (MediaCrawler config/base_config.py) temporarily to set fetch count. The pipeline passes the user's Douyin cookies into MediaCrawler via a command-line argument. Those actions go beyond simple 'read-only' scraping guidance and introduce potential exposure (see environment_proportionality).
Install Mechanism
There is no automated install spec in the registry (instruction-only), which is lower automatic risk. The instructions ask the user to git clone a public GitHub repo (NanmiCoder/MediaCrawler) and to pip-install dependencies and Playwright's browser binaries. Using an official GitHub repo is typical; the user-run clone means no arbitrary binary downloads are silently executed by the platform. Still, installing Playwright and downloading Whisper model weights are large actions the user should expect.
Credentials
The skill's registry metadata lists no required credentials/env vars, yet the code expects and uses values from .env (MEDIACRAWLER_DIR, optional OUTPUT_BASE_DIR, WHISPER_MODEL). The pipeline relies on a local cookie file (.douyin_cookies.json) and passes the cookie string into MediaCrawler on the subprocess command line (--cookies <cookie_str>), which can expose session cookies via process listings to other local users. The code also writes into the external MediaCrawler repo (overwriting base_config.py and restoring it), which requires write access to user-specified paths and could have side effects if the path points to non-isolated locations.
Persistence & Privilege
The skill is not marked 'always: true' and does not auto-enable itself across agents. It writes run-state and output files under OUTPUT_BASE_DIR and will create/copy ffmpeg executables in library cache folders on Windows. The main privileged behavior is modifying the MediaCrawler base_config.py (temporary patch/restore) in a user-specified directory — this requires filesystem write permission to that installation and is outside the skill's own directory.
What to consider before installing
This package largely does what it says (scrapes Douyin, extracts audio, runs Whisper), but take these precautions before installing/ running:
- Confirm and set MEDIACRAWLER_DIR in the skill .env (the registry metadata omitted this required env). The skill will not work without it.
- Backup your MediaCrawler installation before running this: the scripts temporarily overwrite MediaCrawler/config/base_config.py to change fetch count and then restore it. Ensure MEDIACRAWLER_DIR points to an isolated/copy location you control.
- Be aware cookies: the pipeline stores Douyin session cookies in a local .douyin_cookies.json and passes them as a command-line argument to MediaCrawler. Passing secrets on a command line can expose them via process listings to other users on the same machine — avoid running this on multi-user/shared systems. After use, consider deleting or rotating the cookie.
- Expect big downloads and disk usage: Playwright browser binaries and Whisper model weights (medium model ~GBs) will be downloaded; make sure you have bandwidth and storage.
- Review accounts.txt and .env for any unintended targets or output directories; set OUTPUT_BASE_DIR to an isolated folder you control.
- Confirm legal/ToS considerations: automated scraping may violate Douyin/TikTok terms of service or local law — ensure you have the right to scrape the targeted accounts.
If you want to proceed in a safer manner: run the pipeline on an isolated VM/container or a dedicated user account, inspect the code (especially run_mediacrawler and set_mediacrawler_max_count), and avoid running the workflow on shared systems where process listing or filesystem writes could leak credentials or affect other software.Like a lobster shell, security has layers — review code before you run it.
latestvk972sn0gngfbbf3n9c01pnhmf98395h8
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
