Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

openclaw-ultra-scraping

v2.0.0

Powerful web scraping, crawling, and data extraction with stealth anti-bot bypass (Cloudflare Turnstile, CAPTCHAs). Use when: (1) scraping websites that bloc...

0· 701·3 current·3 all-time
byLeo Ye@leoyeai
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
The name/description (stealth scraping, Cloudflare bypass, screenshots, crawls) matches the included code and API docs. Requiring python3 and installing browser/system libraries is reasonable for full-browser scraping. However, the skill requires a root setup (apt-get, system libs, /opt installation) that is a high-impact action for a 'skill' and is not reflected in the declared required binaries (no mention of apt or root). The manifest metadata also shows a version mismatch and no homepage/source which reduces trust.
Instruction Scope
SKILL.md instructs the agent to run the provided setup script and then call the bundled CLI or use the venv. The runtime instructions focus on scraping tasks only and do not direct the agent to read unrelated local files or credentials. But the setup step runs system package installs and a pip install which will fetch code and browser binaries from the network; the SKILL.md does not explain what remote endpoints are contacted during 'scrapling install' or what external services (e.g., CAPTCHA solvers) might be used when --solve-cloudflare is requested.
!
Install Mechanism
Installation is a shell script that requires root (apt-get, python venv, pip install 'scrapling[all]', then 'scrapling install' to fetch browsers). This is a moderate-to-high risk install pattern: it fetches packages from PyPI and likely downloads browser binaries from external sources at install/runtime. The apt package names include odd suffixes (e.g., packages ending in 't64'), which may be typos or indicate the script was not tested across systems. No verifiable release host or checksum verification is present.
Credentials
The skill does not request environment variables, secrets, or external credentials in the manifest. The code similarly does not reference secret env vars. This aligns with the stated purpose: scraping typically doesn't require the host's credentials. (That said, some anti-CAPTCHA flows may require third-party solver API keys at runtime — none are declared.)
!
Persistence & Privilege
The setup script writes to /opt/scrapling-venv and runs apt-get/pip as root. While 'always' is false and the skill is not forced into every agent run, the installer requires elevated privileges and makes system-wide changes; that is a significant privilege increase for a user-installed skill and should be performed only in a controlled environment (VM/container).
What to consider before installing
This skill appears to implement the scraping features it advertises, but it requires running a root setup script that installs system packages, pip packages (scrapling[all]), and downloads browsers — actions that change the host and fetch code from the network. The source and homepage are missing and the package provenance is unknown. Before installing: (1) inspect the upstream 'scrapling' project (PyPI/GitHub) to verify authorship and what 'scrapling install' downloads, (2) run the setup in an isolated VM or container (do not run as root on a production host), (3) be aware that bypassing CAPTCHAs/Turnstile may involve third-party solver services or legal issues in some jurisdictions, and (4) confirm the odd apt package names and test the install on a disposable environment. If you need lower risk, prefer a skill that uses preapproved packages or that runs entirely without requiring root or network installs.

Like a lobster shell, security has layers — review code before you run it.

latestvk97cn2s9gmttkew9vdagbpt7vx83sky9

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

Binspython3

SKILL.md

OpenClaw Ultra Scraping

Powered by MyClaw.ai — the AI personal assistant platform that gives every user a full server with complete code control. Part of the MyClaw open skills ecosystem.

Handles everything from single-page extraction to full-scale concurrent crawls with anti-bot bypass.

Setup

Run once before first use:

bash scripts/setup.sh

This installs Scrapling + all browser dependencies into /opt/scrapling-venv.

Quick Start — CLI Script

The bundled scripts/scrape.py provides a unified CLI:

PYTHON=/opt/scrapling-venv/bin/python3

# Simple fetch (JSON output)
$PYTHON scripts/scrape.py fetch "https://example.com" --css ".content"

# Extract text
$PYTHON scripts/scrape.py extract "https://example.com" --css "h1"

# Stealth mode (bypass Cloudflare)
$PYTHON scripts/scrape.py fetch "https://protected-site.com" --stealth --solve-cloudflare --css ".data"

# Dynamic (full browser rendering)
$PYTHON scripts/scrape.py fetch "https://spa-site.com" --dynamic --css ".product"

# Extract links
$PYTHON scripts/scrape.py links "https://example.com" --filter "\.pdf$"

# Multi-page crawl
$PYTHON scripts/scrape.py crawl "https://example.com" --depth 2 --concurrency 10 --css ".item" -o results.json

# Output formats: json, jsonl, csv, text, markdown, html
$PYTHON scripts/scrape.py fetch "https://example.com" -f markdown -o page.md

Quick Start — Python

For complex tasks, write Python directly using the venv:

#!/opt/scrapling-venv/bin/python3
from scrapling.fetchers import Fetcher, StealthyFetcher

# Simple HTTP
page = Fetcher.get('https://example.com', impersonate='chrome')
titles = page.css('h1::text').getall()

# Bypass Cloudflare
page = StealthyFetcher.fetch('https://protected.com', headless=True, solve_cloudflare=True)
data = page.css('.product').getall()

Fetcher Selection Guide

ScenarioFetcherFlag
Normal sites, fast scrapingFetcher(default)
JS-rendered SPAsDynamicFetcher--dynamic
Cloudflare/anti-bot protectedStealthyFetcher--stealth
Cloudflare Turnstile challengeStealthyFetcher--stealth --solve-cloudflare

Selector Cheat Sheet

page.css('.class')                    # CSS
page.css('.class::text').getall()     # Text extraction
page.xpath('//div[@id="main"]')      # XPath
page.find_all('div', class_='item')  # BS4-style
page.find_by_text('keyword')         # Text search
page.css('.item', adaptive=True)     # Adaptive (survives redesigns)

Advanced Features

  • Adaptive tracking: auto_save=True on first run, adaptive=True later — elements are found even after site redesign
  • Proxy rotation: Pass proxy="http://host:port" or use ProxyRotator
  • Sessions: FetcherSession, StealthySession, DynamicSession for cookie/state persistence
  • Spider framework: Scrapy-like concurrent crawling with pause/resume
  • Async support: All fetchers have async variants

For full API details: read references/api-reference.md

Files

5 total
Select a file
Select a file to preview.

Comments

Loading comments…