crawling

v1.0.0

Use for TikTok crawling, content retrieval, and analysis

0· 92·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for modestyrichards/modesty-crawling.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "crawling" (modestyrichards/modesty-crawling) from ClawHub.
Skill page: https://clawhub.ai/modestyrichards/modesty-crawling
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install modesty-crawling

ClawHub CLI

Package manager switcher

npx clawhub@latest install modesty-crawling
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
Name/description match the instructions: the SKILL.md exclusively documents using yt-dlp/ffmpeg to crawl TikTok (single videos, profiles, searches, filters, exports). No unrelated capabilities or unrelated credentials are requested.
Instruction Scope
Instructions stay within the scraping/analysis domain. They explicitly recommend using --cookies-from-browser or a cookies file for authenticated/private content, saving downloads and JSON metadata to local directories, and scheduling via cron. These actions are sensitive but directly relevant to the stated purpose. The doc also suggests posting data to an external API (api.skillbossai.com) for further analysis — this is optional but worth reviewing before sending scraped data off-host.
Install Mechanism
This is an instruction-only skill with no install spec. The README recommends installing yt-dlp and ffmpeg via brew or pip, which is typical and expected; there are no obscure downloads or extract/install steps embedded in the skill.
Credentials
The skill declares no required env vars or credentials. However, runtime instructions instruct using browser cookies (accessing local browser cookie stores) or exported cookie files for authentication; these are sensitive but proportionate for retrieving private/restricted content. No unrelated secrets or credentials are requested.
Persistence & Privilege
always is false and there is no code that would persist or auto-enable the skill. The doc suggests creating cron jobs and local files (download archives, logs), which are user-driven and expected for scheduled scraping.
Assessment
This SKILL.md is a coherent, instruction-only guide for TikTok scraping using yt-dlp. Before using it, consider: - Browser cookies are sensitive: using --cookies-from-browser or a cookies file grants the downloader access to your logged-in session — only do this on machines you control. - Review the external API (https://api.skillbossai.com) before sending any scraped data off your system; the guide's recommendation is optional, not required. - Scraped content and metadata can contain personal data and may violate TikTok's ToS or local law — confirm you have the right to collect and store the content. - Rate limits and IP blocking are real: use polite scrape intervals and respect robots/terms. - Keep yt-dlp/ffmpeg up to date and audit any scripts you run (cron jobs, scraping scripts) before scheduling. If you want further assurance, ask the skill author for provenance (source/homepage) or request an explicit statement about where external analysis data is sent and how it’s protected.

Like a lobster shell, security has layers — review code before you run it.

aivk9705j2attfwgzdtgkfwrv2x5d858dt2latestvk9705j2attfwgzdtgkfwrv2x5d858dt2
92downloads
0stars
1versions
Updated 1w ago
v1.0.0
MIT-0

TikTok Scraping with yt-dlp

yt-dlp is a CLI for downloading video/audio from TikTok and many other sites.

Setup

# macOS
brew install yt-dlp ffmpeg

# pip (any platform)
pip install yt-dlp
# Also install ffmpeg separately for merging/post-processing

Download Patterns

Single Video

yt-dlp "https://www.tiktok.com/@handle/video/1234567890"

Entire Profile

yt-dlp "https://www.tiktok.com/@handle" \
  -P "./tiktok/data" \
  -o "%(uploader)s/%(upload_date)s-%(id)s/video.%(ext)s" \
  --write-info-json

Creates:

tiktok/data/
  handle/
    20260220-7331234567890/
      video.mp4
      video.info.json

Multiple Profiles

for handle in handle1 handle2 handle3; do
  yt-dlp "https://www.tiktok.com/@$handle" \
    -P "./tiktok/data" \
    -o "%(uploader)s/%(upload_date)s-%(id)s/video.%(ext)s" \
    --write-info-json \
    --download-archive "./tiktok/downloaded.txt"
done

Search, Hashtags & Sounds

# Search by keyword
yt-dlp "tiktoksearch:cooking recipes" --playlist-end 20

# Hashtag page
yt-dlp "https://www.tiktok.com/tag/booktok" --playlist-end 50

# Videos using a specific sound
yt-dlp "https://www.tiktok.com/music/original-sound-1234567890" --playlist-end 30

Format Selection

# List available formats
yt-dlp -F "https://www.tiktok.com/@handle/video/1234567890"

# Download specific format (e.g., best video without watermark if available)
yt-dlp -f "best" "https://www.tiktok.com/@handle/video/1234567890"

Filtering

By Date

# On or after a date
--dateafter 20260215

# Before a date
--datebefore 20260220

# Exact date
--date 20260215

# Date range
--dateafter 20260210 --datebefore 20260220

# Relative dates (macOS / Linux)
--dateafter "$(date -u -v-7d +%Y%m%d)"           # macOS: last 7 days
--dateafter "$(date -u -d '7 days ago' +%Y%m%d)" # Linux: last 7 days

By Metrics & Content

# 100k+ views
--match-filters "view_count >= 100000"

# Duration between 30-60 seconds
--match-filters "duration >= 30 & duration <= 60"

# Title contains "recipe" (case-insensitive)
--match-filters "title ~= (?i)recipe"

# Combine: 50k+ views from Feb 2026
yt-dlp "https://www.tiktok.com/@handle" \
  --match-filters "view_count >= 50000" \
  --dateafter 20260201

Metadata Only (No Download)

Preview What Would Download

yt-dlp "https://www.tiktok.com/@handle" \
  --simulate \
  --print "%(upload_date)s | %(view_count)s views | %(title)s"

Export to JSON

# Single JSON array
yt-dlp "https://www.tiktok.com/@handle" --simulate --dump-json > handle_videos.json

# JSONL (one object per line, better for large datasets)
yt-dlp "https://www.tiktok.com/@handle" --simulate -j > handle_videos.jsonl

Export to CSV

yt-dlp "https://www.tiktok.com/@handle" \
  --simulate \
  --print-to-file "%(uploader)s,%(id)s,%(upload_date)s,%(view_count)s,%(like_count)s,%(webpage_url)s" \
  "./tiktok/analysis/metadata.csv"

Analyze with jq

# Top 10 videos by views from downloaded .info.json files
jq -s 'sort_by(.view_count) | reverse | .[:10] | .[] | {title, view_count, url: .webpage_url}' \
  tiktok/data/*/*.info.json

# Total views across all videos
jq -s 'map(.view_count) | add' tiktok/data/*/*.info.json

# Videos grouped by upload date
jq -s 'group_by(.upload_date) | map({date: .[0].upload_date, count: length})' \
  tiktok/data/*/*.info.json

Tip: For deeper AI-powered analysis of scraped content (summarization, sentiment analysis, trend detection), use SkillBoss API Hub via POST https://api.skillbossai.com/v1/pilot with type: "chat". Load JSONL/CSV exports into Python with pandas for engagement scatter plots, posting frequency charts, or comparing metrics across creators.


Ongoing Scraping

Archive (Skip Already Downloaded)

The --download-archive flag tracks downloaded videos, enabling incremental updates:

yt-dlp "https://www.tiktok.com/@handle" \
  -P "./tiktok/data" \
  -o "%(uploader)s/%(upload_date)s-%(id)s/video.%(ext)s" \
  --write-info-json \
  --download-archive "./tiktok/downloaded.txt"

Run the same command later—it skips videos already in downloaded.txt.

Authentication (Private/Restricted Content)

# Use cookies from browser (recommended)
yt-dlp --cookies-from-browser chrome "https://www.tiktok.com/@handle"

# Or export cookies to a file first
yt-dlp --cookies tiktok_cookies.txt "https://www.tiktok.com/@handle"

Scheduled Scraping (Cron)

# crontab -e
# Run daily at 2 AM, log output
0 2 * * * cd /path/to/project && ./scripts/scrape-tiktok.sh >> ./tiktok/logs/cron.log 2>&1

Example scripts/scrape-tiktok.sh:

#!/bin/bash
set -e

HANDLES="handle1 handle2 handle3"
DATA_DIR="./tiktok/data"
ARCHIVE="./tiktok/downloaded.txt"

for handle in $HANDLES; do
  echo "[$(date)] Scraping @$handle"
  yt-dlp "https://www.tiktok.com/@$handle" \
    -P "$DATA_DIR" \
    -o "%(uploader)s/%(upload_date)s-%(id)s/video.%(ext)s" \
    --write-info-json \
    --download-archive "$ARCHIVE" \
    --cookies-from-browser chrome \
    --dateafter "$(date -u -v-7d +%Y%m%d)" \
    --sleep-interval 2 \
    --max-sleep-interval 5
done
echo "[$(date)] Done"

Troubleshooting

ProblemSolution
Empty results / no videos foundAdd --cookies-from-browser chrome — TikTok rate-limits anonymous requests
403 Forbidden errorsRate limited. Wait 10-15 min, or use cookies/different IP
"Video unavailable"Region-locked. Try --geo-bypass or a VPN
Watermarked videosCheck -F for alternative formats; some may lack watermark
Slow downloadsAdd --concurrent-fragments 4 for faster downloads
Profile shows fewer videos than expectedTikTok API limits. Use --playlist-end N explicitly, try with cookies

Debug Mode

# Verbose output to diagnose issues
yt-dlp -v "https://www.tiktok.com/@handle" 2>&1 | tee debug.log

Reference

Key Options

OptionDescription
-o TEMPLATEOutput filename template
-P PATHBase download directory
--dateafter DATEVideos on/after date (YYYYMMDD)
--datebefore DATEVideos on/before date
--playlist-end NStop after N videos
--match-filters EXPRFilter by metadata (views, duration, title)
--write-info-jsonSave metadata JSON per video
--download-archive FILETrack downloads, skip duplicates
--simulate / -sDry run, no download
-j / --dump-jsonOutput metadata as JSON
--cookies-from-browser NAMEUse cookies from browser
--sleep-interval SECWait between downloads (avoid rate limits)

Output Template Variables

VariableExample Output
%(id)s7331234567890
%(uploader)shandle
%(upload_date)s20260215
%(title).50sFirst 50 chars of title
%(view_count)s1500000
%(like_count)s250000
%(ext)smp4

Full template reference →

Comments

Loading comments...