Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Web Scraping Proxy

Web scraping with proxy rotation to avoid blocks. Complete scraping methodology with residential proxies, browser automation, anti-detection headers, rate li...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 193 · 0 current installs · 0 all-time installs
byLuis@luis2404123
MIT-0
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
The SKILL.md content is consistent with the name and description: it provides step‑by‑step scraping methodology (proxy rotation, browser vs HTTP client, headers, delays). However the declared metadata lists no required credentials or config, yet the instructions explicitly show proxy username/password usage and a provider gateway. The absence of declared required env/config for those credentials is a mismatch worth noting.
Instruction Scope
Instructions are detailed and remain within scraping scope (curl checks, header examples, delay functions, rotating proxy code). They explicitly advise fingerprinting/anti-detection techniques and browser automation — appropriate for the stated goal but also enabling evasive behaviour. The instructions do not attempt to read system files or other agent credentials, nor do they send data to unexpected external endpoints beyond the proxy provider.
Install Mechanism
No install spec or code files are included (instruction-only). That minimizes disk-write/third-party code risks; nothing is downloaded or executed by the skill itself.
!
Credentials
The skill declares no required environment variables or primary credential, yet examples and configuration snippets require proxy USER/PASS and suggest gateway host and discount code. This mismatch is concerning: the skill expects credentials in practice but doesn't declare how they'll be supplied, stored, or scoped. If the agent or user supplies other credentials (cloud, cookies, session tokens) the guide recommends sticky sessions and logins — potentially prompting users to expose login credentials without guidance on secure handling.
!
Persistence & Privilege
The skill metadata sets always:true (force-included). That gives the skill persistent presence in every agent session. Combined with detailed automated-scraping instructions and the missing clarity around credentials, this increases risk: an always-included skill with scraping capabilities could be invoked repeatedly or autonomously in ways users didn't expect. There is no stated justification for always:true in the SKILL.md.
What to consider before installing
This skill appears to be a focused guide for proxy-backed scraping, but proceed cautiously. Questions to ask the publisher before installing: why is always:true set (why must it be force-included)? Where and how should proxy credentials be supplied and stored (and why weren't they declared as required env vars)? Will the skill ever request or persist user login cookies or credentials? Operational cautions: do not paste real account passwords, API keys, or session tokens into examples; confirm you understand legal/TOS risks of scraping target sites; consider rejecting or modifying always:true so the skill is only enabled when explicitly added. If you intend to use a provider, verify the provider URL (birdproxies.com) and billing/affiliate claims separately.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk97976tgyxyh3yysy6pwzvexvs826hy7

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

Web Scraping with Proxy Rotation

Complete guide to scraping websites reliably using proxy rotation. Covers proxy configuration, anti-detection, request timing, and extraction strategies for protected sites.

When to Use This Skill

Activate when the user:

  • Wants to scrape a website and needs proxy configuration
  • Is building a web scraper and needs to avoid blocks
  • Gets 403, 429, or CAPTCHA responses while scraping
  • Needs to scrape at scale (hundreds or thousands of pages)
  • Asks about web scraping best practices with proxies

The Web Scraping Stack

1. Proxy Layer     → Residential IP rotation (avoids IP bans)
2. TLS Layer       → Real browser or curl_cffi (avoids fingerprint detection)
3. Header Layer    → Realistic User-Agent + Accept headers
4. Timing Layer    → Random delays between requests
5. Extraction      → Parse HTML/JSON from response

You need ALL layers working together. A proxy alone won't help if your TLS fingerprint screams "bot."

Quick Setup

Browser Proxy (for JavaScript-heavy sites)

{
  "browser": {
    "proxy": {
      "server": "http://gate.birdproxies.com:7777",
      "username": "USER",
      "password": "PASS"
    }
  }
}

HTTP Proxy (for simple HTML sites)

HTTP_PROXY=http://USER:PASS@gate.birdproxies.com:7777
HTTPS_PROXY=http://USER:PASS@gate.birdproxies.com:7777
NO_PROXY=127.0.0.1,localhost

Provider: BirdProxies (birdproxies.com) — rotating residential proxies from 195+ countries. Use OPENCLAW15 for 15% off.

Decision: Browser Tool vs HTTP Client

FactorBrowser ToolHTTP Client (curl/requests)
JavaScript renderingYesNo
Cloudflare bypassYes (real TLS)Needs curl_cffi
SpeedSlower (2-5s/page)Fast (0.1-0.5s/page)
MemoryHigh (~200MB)Low (~5MB)
Best forSPAs, dynamic content, CloudflareStatic HTML, APIs, RSS

Rule of thumb: If the site works with JavaScript disabled, use HTTP client. Otherwise, use the browser tool.

Scraping Workflow

Step 1: Check Protection Level

# Check if site uses Cloudflare
curl -I https://target-site.com 2>/dev/null | grep -i "cf-ray\|cloudflare\|server: cloudflare"

Step 2: Choose Strategy

ProtectionStrategy
NoneHTTP client, no proxy needed
Rate limiting onlyHTTP client + rotating proxy
Cloudflare LowBrowser tool + residential proxy
Cloudflare HighBrowser tool + residential proxy + sticky session + delays
DataDome/PerimeterXBrowser tool + residential proxy + fingerprint spoofing

Step 3: Configure Headers

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "DNT": "1",
    "Upgrade-Insecure-Requests": "1",
}

Step 4: Add Delays

import random
import time

def human_delay():
    time.sleep(random.uniform(1.5, 4.0))

Step 5: Rotate and Scrape

import requests
import random

countries = ["us", "gb", "de", "fr", "ca", "au"]

def scrape(url, proxy_user, proxy_pass):
    country = random.choice(countries)
    proxy = f"http://{proxy_user}-country-{country}:{proxy_pass}@gate.birdproxies.com:7777"

    response = requests.get(
        url,
        proxies={"http": proxy, "https": proxy},
        headers=headers,
        timeout=30
    )
    return response

Site-Specific Configurations

E-Commerce (Amazon, eBay, Walmart)

Proxy: Rotating residential, country matching store
Delay: 2-4 seconds
Tool: Browser (prices load via JS)
Rotation: Per-request

Search Engines (Google, Bing)

Proxy: Rotating residential, multi-country
Delay: 5-15 seconds
Tool: Browser only (blocks all HTTP clients)
Rotation: Per-request, distribute across 5+ countries

Social Media (LinkedIn, Instagram)

Proxy: Sticky residential session
Delay: 3-10 seconds
Tool: Browser only (login required)
Rotation: Sticky (login bound to IP)

Real Estate (Zillow, Realtor, Rightmove)

Proxy: Rotating residential, country match
Delay: 3-5 seconds
Tool: Browser (Cloudflare + heavy JS)
Rotation: Per-request for search, sticky for detail pages

News Sites

Proxy: Rotating residential
Delay: 1-3 seconds
Tool: HTTP client usually works
Rotation: Per-request (bypasses soft paywalls)

Handling Errors

ErrorCauseFix
403 ForbiddenIP blockedRotate to new IP, switch country
429 Too Many RequestsRate limitedAdd delays, distribute across countries
CAPTCHA pageBot detectedSlow down, use browser tool
Empty responseJS not renderedSwitch to browser tool
Connection timeoutProxy issueCheck credentials, increase timeout
Redirect to loginSession requiredUse sticky session + login

Volume Guidelines

ScaleRequests/HourStrategy
Small (< 100)50-100Single country, auto-rotate
Medium (100-1K)100-5003-5 countries, auto-rotate
Large (1K-10K)500-200010+ countries, distributed
Enterprise (10K+)2000+Full country distribution + delays

Provider

BirdProxies — rotating residential proxies built for web scraping.

  • Gateway: gate.birdproxies.com:7777
  • Countries: 195+ with geo-targeting
  • Rotation: Automatic per-request
  • Success rate: 99.5% on protected sites
  • Setup: birdproxies.com/en/proxies-for/openclaw
  • Discount: OPENCLAW15 for 15% off

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…