Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Youtube Scrapper

A skill for discovering and scraping YouTube channels based on categories and locations without requiring API keys or login.

MIT-0 · Free to use, modify, and redistribute. No attribution required.
2 · 676 · 0 current installs · 0 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
high confidence
!
Purpose & Capability
The name/description (YouTube scraping) matches the behavior described in SKILL.md, but the runtime instructions refer to Python scripts (scripts/*.py), Playwright/Chromium, regional config files, and residential proxy providers. The registry metadata lists no required binaries, no env vars, and no code files — meaning the skill as published cannot perform the claimed actions. This mismatch is a substantive incoherence.
!
Instruction Scope
SKILL.md instructs running discovery and scraper Python scripts, reading/writing queue and output files, downloading thumbnails, and using anti-detection techniques (fingerprint rotation, stealth JS, mouse simulation, request interception). Those instructions reference filesystem paths and external services (proxy providers, Google Search) but the bundle provides none of the scripts/resources or any declared credentials. The instructions also describe active evasion of detection, which broadens the operational scope well beyond a simple read-only integration.
!
Install Mechanism
There is no install spec and no code files. The embedded YAML in SKILL.md lists required bins (python3, chromium), but the registry metadata showed none — an inconsistency. Because this is instruction-only with references to scripts and resources that are missing, the skill cannot be installed or run as-is. The absence of an install mechanism also means there is no declared, reviewable source for the code that will actually execute.
!
Credentials
The skill declares no required environment variables or primary credential, yet the SKILL.md references residential proxy support and four providers (e.g., brightdata) and regional config files. Residential proxies and some provider integrations normally require credentials/API keys; the SKILL.md's claim of 'no API keys required' conflicts with the listed provider integrations. This is disproportionate and unexplained.
Persistence & Privilege
The skill does not request persistent presence (always: false) and does not declare any special system-wide privileges. Model invocation is allowed (the platform default). There is no indication the skill would modify other skills or system configs. However, autonomous invocation combined with the other incoherences increases risk if the missing pieces are later supplied.
What to consider before installing
Do not install or run this skill as-is. The SKILL.md describes Python scripts, Playwright/Chromium, proxy providers, and config files that are not included in the published bundle and are not declared in the registry metadata — so the package cannot function and may be incomplete or intentionally stripped. Before proceeding, ask the publisher for: (1) the full source repository or release package containing the referenced scripts and resources, (2) a clear install spec (how Python/Playwright/Chromium are installed), and (3) an explicit list of environment variables or credentials required (and why). If you plan to provide proxy/API credentials, only do so after auditing the actual code. Also be aware the tool claims anti-detection/evasion techniques — those increase legal and policy risk (YouTube/Google terms) and broaden potential harm. If the publisher cannot produce verifiable source and installation steps, treat the skill as untrusted.

Like a lobster shell, security has layers — review code before you run it.

Current versionv0.1.1
Download zip
latestvk976ndwh3z02njab4kq71zgya981w5cb

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

YouTube Channel Scraper

A browser-based YouTube channel discovery and scraping tool.

Part of ScrapeClaw — a suite of production-ready, agentic social media scrapers for Instagram, YouTube, X/Twitter, and Facebook built with Python & Playwright, no API keys required.

---
name: youtube-scrapper
description: Discover and scrape YouTube channels from your browser.
emoji: 📺
version: 1.0.2
author: influenza
tags:
  - youtube
  - scraping
  - social-media
  - channel-discovery
  - influencer-discovery
metadata:
  clawdbot:
    requires:
      bins:
        - python3
        - chromium

    config:
      stateDirs:
        - data/output
        - data/queue
        - thumbnails
      outputFormats:
        - json
        - csv
---

Overview

This skill provides a two-phase YouTube scraping system:

  1. Channel Discovery — Find YouTube channels via Google Search (browser-based, no API key required)
  2. Browser Scraping — Scrape public channel data using Playwright with anti-detection (no login required)

Features

  • 🔍 - Discover YouTube channels by location and category
  • 🌐 - Full browser simulation for accurate scraping
  • 🛡️ - Browser fingerprinting, human behavior simulation, and stealth scripts
  • 📊 - Channel info, subscribers, views, videos, engagement data, and media
  • 💾 - JSON export with downloaded thumbnails
  • 🔄 - Resume interrupted scraping sessions
  • ⚡ - Auto-skip unavailable channels and low-subscriber profiles
  • 🌍 - Built-in residential proxy support with 4 providers
  • 🗺️ - Regional configs for US, UK, Europe, India, Gulf, and East Asia

Usage

Agent Tool Interface

For OpenClaw agent integration, the skill provides JSON output:

# Discover YouTube channels (returns JSON queue)
python scripts/youtube_channel_discovery.py --categories tech --locations India

# Scrape from a queue file
python scripts/youtube_channel_scraper.py --queue data/queue/your_queue_file.json

# Full orchestration — discover + scrape in one go
python scripts/youtube_orchestrator.py --config resources/scraper_config_ind.json

Output Data

Channel Data Structure

{
  "channel_name": "Marques Brownlee",
  "channel_url": "https://www.youtube.com/@mkbhd",
  "subscribers": 19200000,
  "total_views": 4500000000,
  "video_count": 1800,
  "description": "MKBHD: Quality Tech Videos...",
  "joined_date": "Mar 21, 2008",
  "country": "United States",
  "profile_pic_url": "https://...",
  "profile_pic_local": "thumbnails/mkbhd/profile_abc123.jpg",
  "banner_url": "https://...",
  "banner_local": "thumbnails/mkbhd/banner_def456.jpg",
  "influencer_tier": "mega",
  "category": "tech",
  "scrape_location": "New York",
  "scraped_at": "2026-02-17T12:00:00",
  "recent_videos": [
    {
      "title": "Galaxy S26 Ultra Review",
      "url": "https://www.youtube.com/watch?v=...",
      "views": 5200000,
      "published": "2 days ago",
      "duration": "14:32",
      "thumbnail_url": "https://...",
      "thumbnail_local": "thumbnails/mkbhd/video_0_ghi789.jpg"
    }
  ]
}

Queue File Structure

{
  "location": "India",
  "category": "tech",
  "total": 20,
  "channels": ["@channel1", "@channel2", "..."],
  "completed": ["@channel1"],
  "failed": {"@channel3": "not_found"},
  "current_index": 2,
  "created_at": "2026-02-17T12:00:00",
  "source": "google_search"
}

Influencer Tiers

TierSubscribers Range
nano< 1,000
micro1,000 – 10,000
mid10,000 – 100,000
macro100,000 – 1M
mega> 1,000,000

File Outputs

  • Queue files: data/queue/{region}/{location}_{category}_{timestamp}.json
  • Scraped data: data/output_{region}/{channel_name}.json
  • Thumbnails: thumbnails_{region}/{channel}/profile_*.jpg, thumbnails_{region}/{channel}/video_*.jpg
  • Progress: data/progress/discovery_progress_{region}.json

Configuration

Regional config files live in resources/:

resources/scraper_config_us.json
resources/scraper_config_uk.json
resources/scraper_config_eur.json
resources/scraper_config_ind.json
resources/scraper_config_gulf.json
resources/scraper_config_east.json

Example config (resources/scraper_config_ind.json):

{
  "proxy": {
    "enabled": false,
    "provider": "brightdata",
    "country": "",
    "sticky": true,
    "sticky_ttl_minutes": 10
  },
  "categories": [
    "gaming", "tech", "beauty", "fashion", "fitness",
    "food", "travel", "music", "education", "comedy",
    "lifestyle", "cooking", "diy", "art", "finance",
    "health", "entertainment"
  ],
  "locations": [
    "India", "Mumbai", "Delhi", "Bangalore", "Hyderabad",
    "Chennai", "Kolkata", "Pune", "Ahmedabad", "Jaipur"
  ],
  "max_videos_to_scrape": 6,
  "headless": false,
  "results_per_search": 20,
  "search_delay": [3, 7],
  "scrape_delay": [2, 5],
  "rate_limit_wait": 60,
  "max_retries": 3
}

Filters Applied

The scraper automatically filters out:

  • ❌ Unavailable or terminated channels
  • ❌ Channels with < 500 subscribers (configurable)
  • ❌ Non-existent channel URLs
  • ❌ Already scraped entries (deduplication)
  • ❌ Rate-limited requests (auto-retry with backoff)

Anti-Detection

The scraper uses multiple anti-detection techniques:

  • Browser fingerprinting — Rotating fingerprint profiles (viewport, user agent, timezone, WebGL, etc.)
  • Stealth JavaScript — Hides navigator.webdriver, spoofs plugins/languages/hardware, canvas noise, fake chrome object
  • Human behavior simulation — Random delays, mouse movements, scrolling patterns
  • Network randomization — Variable timing between requests
  • Request interception — Blocks known fingerprinting and tracking scripts

Troubleshooting

No Channels Discovered

  • Try different location/category combinations
  • Check if Google Search is returning CAPTCHA pages
  • Run with --headless false to debug visually

Rate Limiting

  • Reduce scraping speed (increase delays in config)
  • Run during off-peak hours
  • Use a residential proxy (see below)

Browser Crashes

  • The orchestrator auto-restarts the browser every 50 channels
  • Interrupted scrapes can be resumed — queue files track progress automatically

🌐 Residential Proxy Support

Why Use a Residential Proxy?

Running a scraper at scale without a residential proxy will get your IP blocked fast. Here's why proxies are essential for long-running scrapes:

AdvantageDescription
Avoid IP BansResidential IPs look like real household users, not data-center bots. YouTube is far less likely to flag them.
Automatic IP RotationEach request (or session) gets a fresh IP, so rate-limits never stack up on one address.
Geo-TargetingRoute traffic through a specific country/city so scraped content matches the target audience's locale.
Sticky SessionsKeep the same IP for a configurable window (e.g. 10 min) — critical for maintaining a consistent browsing session.
Higher Success RateRotating residential IPs deliver 95%+ success rates compared to ~30% with data-center proxies on YouTube.
Long-Running ScrapesScrape thousands of channels over hours or days without interruption.
Concurrent ScrapingRun multiple browser instances across different IPs simultaneously.

Recommended Proxy Providers

We have affiliate partnerships with top residential proxy providers. Using these links supports continued development of this skill:

ProviderBest ForSign Up
Bright DataWorld's largest network, 72M+ IPs, enterprise-grade👉 Get Bright Data
IProyalPay-as-you-go, 195+ countries, no traffic expiry👉 Get IProyal
Storm ProxiesFast & reliable, developer-friendly API, competitive pricing👉 Get Storm Proxies
NetNutISP-grade network, 52M+ IPs, direct connectivity👉 Get NetNut

Setup Steps

1. Get Your Proxy Credentials

Sign up with any provider above, then grab:

  • Username (from your provider dashboard)
  • Password (from your provider dashboard)
  • Host and Port are pre-configured per provider (or use custom)

2. Configure via Environment Variables

export PROXY_ENABLED=true
export PROXY_PROVIDER=brightdata    # brightdata | iproyal | stormproxies | netnut | custom
export PROXY_USERNAME=your_user
export PROXY_PASSWORD=your_pass
export PROXY_COUNTRY=us             # optional: two-letter country code
export PROXY_STICKY=true            # optional: keep same IP per session

3. Provider-Specific Host/Port Defaults

These are auto-configured when you set the provider name:

ProviderHostPort
Bright Databrd.superproxy.io22225
IProyalproxy.iproyal.com12321
Storm Proxiesrotating.stormproxies.com9999
NetNutgw-resi.netnut.io5959

Override with PROXY_HOST / PROXY_PORT env vars if your plan uses a different gateway.

4. Custom Proxy Provider

For any other proxy service, set provider to custom and supply host/port manually:

{
  "proxy": {
    "enabled": true,
    "provider": "custom",
    "host": "your.proxy.host",
    "port": 8080,
    "username": "user",
    "password": "pass"
  }
}

Running the Scraper with Proxy

Once configured, the scraper picks up the proxy automatically — no extra flags needed:

# Discover and scrape as usual — proxy is applied automatically
python scripts/youtube_orchestrator.py --config resources/scraper_config_ind.json

# The log will confirm proxy is active:
# INFO - Proxy enabled: <ProxyManager provider=brightdata enabled host=brd.superproxy.io:22225>
# INFO - Browser using proxy: brightdata → brd.superproxy.io:22225

Using the Proxy Manager Programmatically

from proxy_manager import ProxyManager

# From config (auto-reads config from resources/)
pm = ProxyManager.from_config()

# From environment variables
pm = ProxyManager.from_env()

# Manual construction
pm = ProxyManager(
    provider="brightdata",
    username="your_user",
    password="your_pass",
    country="us",
    sticky=True
)

# For Playwright browser context
proxy = pm.get_playwright_proxy()
# → {"server": "http://brd.superproxy.io:22225", "username": "user-country-us-session-abc123", "password": "pass"}

# For requests / aiohttp
proxies = pm.get_requests_proxy()
# → {"http": "http://user:pass@host:port", "https": "http://user:pass@host:port"}

# Force new IP (rotates session ID)
pm.rotate_session()

# Debug info
print(pm.info())

Best Practices for Long-Running Scrapes

  1. Use sticky sessions — YouTube requires consistent IPs during a browsing session. Set "sticky": true.
  2. Target the right country — Set "country": "us" (or your target region) so YouTube serves content in the expected locale.
  3. Combine with existing anti-detection — This scraper already has fingerprinting, stealth scripts, and human behavior simulation. The proxy is the final layer.
  4. Rotate sessions between batches — Call pm.rotate_session() between large batches of channels to get a fresh IP.
  5. Use delays — Even with proxies, respect scrape_delay in config (default 2-5s) to avoid aggressive patterns.
  6. Monitor your proxy dashboard — All providers have dashboards showing bandwidth usage and success rates.

Notes

  • No login required — Only scrapes publicly visible content
  • Checkpoint/resume — Queue files track progress; interrupted scrapes can be resumed automatically
  • Rate limiting — Waits 60s on rate limit, exponential backoff on consecutive failures
  • Resilient orchestration — Auto-restarts browser, retries failed channels, graceful shutdown on SIGINT/SIGTERM
  • Regional configs — Pre-built configs for 6 regions covering 200+ cities worldwide

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…