Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

OpenClaw Ultra Scraping

Powerful web scraping, crawling, and data extraction with stealth anti-bot bypass (Cloudflare Turnstile, CAPTCHAs). Use when: (1) scraping websites that bloc...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 408 · 3 current installs · 3 all-time installs
byLeo Ye@LeoYeAI
MIT-0
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
Name/description (anti-bot scraping/Cloudflare bypass) aligns with included code: scrape.py and setup.sh implement fetchers, stealth mode, dynamic rendering, crawling, and proxy rotation. However, the SKILL.md declares a root-requiring install into /opt/scrapling-venv (apt-get + pip) which is not reflected in the registry-level 'No install spec' claim — mismatch that users should be aware of. The advanced anti-bot claims (CAPTCHA solving) may require external solver services, but no credentials are requested or documented.
Instruction Scope
Runtime instructions tell the user to run scripts/setup.sh (runs apt-get, pip install, and 'scrapling install') and then use the bundled CLI. The instructions do not direct the agent to read unrelated host files or exfiltrate data. They do, however, instruct downloads and installation of system libraries and browser binaries from the network — a broader scope than a simple 'instruction-only' skill would suggest.
!
Install Mechanism
The included setup.sh performs apt-get and pip install (scrapling[all]) and runs 'scrapling install' to fetch browsers. These are standard package sources (apt, PyPI) but will download and execute code/binaries at install time. The install requires root and writes to /opt. There are no explicit third‑party URLs in the script, but pip/‘scrapling install’ may pull many dependencies and browser binaries from external hosts — this increases risk and should be run in an isolated environment after verifying package provenance.
Credentials
The skill declares no required environment variables or credentials, which is consistent with the files included. However, practical use of anti-CAPTCHA/anti-bot features often needs external solver APIs or paid proxy services (API keys, tokens) — none are declared or explained. That gap is operationally important and may lead users to supply credentials ad hoc.
!
Persistence & Privilege
The skill does not request 'always: true' and is user-invocable, which is normal. But setup.sh requires root (apt-get, venv creation in /opt) and installs system-level libraries and binaries. This elevated privilege and system-wide installation increases blast radius; the SKILL.md itself recommends using an isolated container/VM.
What to consider before installing
This package appears to implement what it claims (a heavy scraping tool with anti-bot features), but it performs system-level installs (apt-get, pip) and places a virtualenv under /opt, which requires root. Before installing: (1) run the setup in an isolated VM or container; (2) inspect the pip package 'scrapling' (and its dependencies) and confirm sources (PyPI project, maintainer) — pip can install arbitrary code; (3) be aware 'scrapling install' will download browser binaries from the network; (4) consider legal/ToS implications of bypassing anti-bot protections and CAPTCHA solving; (5) expect you may need to supply third-party solver or proxy credentials (not declared by the skill); (6) if you cannot review the upstream package or do not want root installs, do not install this skill on a shared host.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.2
Download zip
latestvk97fd1fmqssvp0pgpkw0jtmaex82baqy

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

Binspython3

SKILL.md

OpenClaw Ultra Scraping

Powered by MyClaw.ai — the AI personal assistant platform that gives every user a full server with complete code control. Part of the MyClaw.ai open skills ecosystem.

Handles everything from single-page extraction to full-scale concurrent crawls with anti-bot bypass.

Setup

Run once before first use:

bash scripts/setup.sh

This installs Scrapling + all browser dependencies into /opt/scrapling-venv.

Quick Start — CLI Script

The bundled scripts/scrape.py provides a unified CLI:

PYTHON=/opt/scrapling-venv/bin/python3

# Simple fetch (JSON output)
$PYTHON scripts/scrape.py fetch "https://example.com" --css ".content"

# Extract text
$PYTHON scripts/scrape.py extract "https://example.com" --css "h1"

# Stealth mode (bypass Cloudflare)
$PYTHON scripts/scrape.py fetch "https://protected-site.com" --stealth --solve-cloudflare --css ".data"

# Dynamic (full browser rendering)
$PYTHON scripts/scrape.py fetch "https://spa-site.com" --dynamic --css ".product"

# Extract links
$PYTHON scripts/scrape.py links "https://example.com" --filter "\.pdf$"

# Multi-page crawl
$PYTHON scripts/scrape.py crawl "https://example.com" --depth 2 --concurrency 10 --css ".item" -o results.json

# Output formats: json, jsonl, csv, text, markdown, html
$PYTHON scripts/scrape.py fetch "https://example.com" -f markdown -o page.md

Quick Start — Python

For complex tasks, write Python directly using the venv:

#!/opt/scrapling-venv/bin/python3
from scrapling.fetchers import Fetcher, StealthyFetcher

# Simple HTTP
page = Fetcher.get('https://example.com', impersonate='chrome')
titles = page.css('h1::text').getall()

# Bypass Cloudflare
page = StealthyFetcher.fetch('https://protected.com', headless=True, solve_cloudflare=True)
data = page.css('.product').getall()

Fetcher Selection Guide

ScenarioFetcherFlag
Normal sites, fast scrapingFetcher(default)
JS-rendered SPAsDynamicFetcher--dynamic
Cloudflare/anti-bot protectedStealthyFetcher--stealth
Cloudflare Turnstile challengeStealthyFetcher--stealth --solve-cloudflare

Selector Cheat Sheet

page.css('.class')                    # CSS
page.css('.class::text').getall()     # Text extraction
page.xpath('//div[@id="main"]')      # XPath
page.find_all('div', class_='item')  # BS4-style
page.find_by_text('keyword')         # Text search
page.css('.item', adaptive=True)     # Adaptive (survives redesigns)

Advanced Features

  • Adaptive tracking: auto_save=True on first run, adaptive=True later — elements are found even after site redesign
  • Proxy rotation: Pass proxy="http://host:port" or use ProxyRotator
  • Sessions: FetcherSession, StealthySession, DynamicSession for cookie/state persistence
  • Spider framework: Scrapy-like concurrent crawling with pause/resume
  • Async support: All fetchers have async variants

For full API details: read references/api-reference.md

Files

4 total
Select a file
Select a file to preview.

Comments

Loading comments…