Scrapling - Stealth Web Scraper
Web scraping using Scrapling — a Python framework with anti-bot bypass (Cloudflare Turnstile, fingerprint spoofing), adaptive element tracking, stealth headl...
MIT-0 · Free to use, modify, and redistribute. No attribution required.
⭐ 0 · 635 · 6 current installs · 6 all-time installs
byDamir Armanov@Damirikys
MIT-0
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name/description (stealth web scraping, Cloudflare bypass, JS rendering) align with the included script and SKILL.md. The skill's instructions to install scrapling, a stealth Playwright fork (patchright), and Chromium are proportionate to the described functionality.
Instruction Scope
Instructions are explicit about installing packages, downloading Chromium, and optionally starting an MCP local HTTP server. The skill also documents 'auto_save' which persists element fingerprints to disk. These behaviors are relevant to the stated purpose but do increase local persistence and expose a local network endpoint if MCP is started — the SKILL.md warns to only start MCP when explicitly needed.
Install Mechanism
There is no registry install spec, but SKILL.md directs the user to run 'pip install scrapling[all]' and 'patchright install chromium'. Installing from PyPI and running a package-provided installer that downloads a browser binary is expected for this capability, but users should be aware that PyPI packages execute arbitrary install-time code and the Chromium installer fetches ~100MB of binaries from the package's installer.
Credentials
The skill declares no required env vars, credentials, or config paths. The behaviour (session/cookie handling, optional local MCP server, disk persistence for fingerprints) is consistent with no additional secret access being requested.
Persistence & Privilege
The skill does not request always:true or elevated platform privileges. However, optional features (MCP local HTTP server and auto_save fingerprints) create persistent local state and expose a local endpoint if used; the SKILL.md explicitly warns to start these only when trusted.
Assessment
This skill appears internally consistent for a stealth web scraper, but it carries the normal risks of such tools. Before installing: 1) Confirm you trust the scrapling and patchright PyPI packages / their maintainers (review their GitHub/PyPI pages and recent activity). 2) Only run stealth/dynamic modes on sites you are authorized to scrape — bypassing anti-bot protections can violate terms or laws. 3) Be cautious with the 'patchright install chromium' step (downloads binaries) and with enabling the MCP server (it opens a local HTTP service). 4) Run installs in an isolated environment (virtualenv or container) and inspect the installed package contents if you need higher assurance. If you want, provide the upstream GitHub/PyPI links and I can check them for suspicious patterns or supply commands to verify package integrity before installing.Like a lobster shell, security has layers — review code before you run it.
Current versionv1.0.3
Download ziplatest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
SKILL.md
Scrapling Skill
Source: https://github.com/D4Vinci/Scrapling (open source, MIT-like license)
PyPI: scrapling — install before first use (see below)
⚠️ Only scrape sites you have permission to access. Respect
robots.txtand Terms of Service. Do not use stealth modes to bypass paywalls or access restricted content without authorization.
Installation (one-time, confirm with user before running)
pip install scrapling[all]
patchright install chromium # required for stealth/dynamic modes
scrapling[all]installspatchright(a stealth fork of Playwright, bundled as a PyPI package — not a typo),curl_cffi, MCP server deps, and IPython shell.patchright install chromiumdownloads Chromium (~100 MB) via patchright's own installer (same mechanism asplaywright install chromium).- Confirm with user before running — installs ~200 MB of dependencies and browser binaries.
Script
scripts/scrape.py — CLI wrapper for all three fetcher modes.
# Basic fetch (text output)
python3 ~/skills/scrapling/scripts/scrape.py <url> -q
# CSS selector extraction
python3 ~/skills/scrapling/scripts/scrape.py <url> --selector ".class" -q
# Stealth mode (Cloudflare bypass) — only on sites you're authorized to access
python3 ~/skills/scrapling/scripts/scrape.py <url> --mode stealth -q
# JSON output
python3 ~/skills/scrapling/scripts/scrape.py <url> --selector "h2" --json -q
Fetcher Modes
- http (default) — Fast HTTP with browser TLS fingerprint spoofing. Most sites.
- stealth — Headless Chrome with anti-detect. For Cloudflare/anti-bot.
- dynamic — Full Playwright browser. For heavy JS SPAs.
When to Use Each Mode
web_fetchreturns 403/429/Cloudflare challenge → use--mode stealth- Page content requires JS execution → use
--mode dynamic - Regular site, just need text/data → use
--mode http(default)
Python Inline Usage
For custom logic beyond the CLI, write inline Python. See references/patterns.md for:
- Adaptive scraping (
auto_save/adaptive— saves element fingerprints locally) - Session/cookie handling
- Async usage
- XPath, find_similar, attribute extraction
Notes
- MCP server (
scrapling mcp): starts a local network service for AI-native scraping. Only start if explicitly needed and trusted — it exposes a local HTTP server. auto_save=True: persists element fingerprints to disk for adaptive re-scraping. Creates local state in working directory.- Stealth/dynamic modes use Chromium headless — no
xvfb-runneeded. - For large-scale crawls, use the Spider API (see Scrapling docs).
Files
3 totalSelect a file
Select a file to preview.
Comments
Loading comments…
