Scrapling Yoo
WarnAudited by ClawScan on May 10, 2026.
Overview
The skill is a coherent Scrapling web-scraping guide, but it explicitly enables stealth scraping, Cloudflare/Turnstile bypass, proxy rotation, and broad crawls that need careful authorization.
Install only if you intentionally need Scrapling-based web scraping. Use it only on sites you own or are authorized to crawl, avoid stealth/proxy/Cloudflare-solving modes unless explicitly permitted, run dependencies in an isolated virtual environment, pin or verify packages, and set crawl limits, delays, and dedicated output/checkpoint directories.
Findings (5)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
An agent could help scrape protected sites or bypass bot controls, which may violate site rules, laws, contracts, or trigger account/IP blocking if used without permission.
The skill explicitly guides the agent toward stealth browser automation and solving bot-protection challenges, which is materially riskier than normal page fetching even though some reference docs mention permissioned use.
`fetch_stealthy` — Anti-bot bypass mode ... `solve_cloudflare=True` # Auto-solve Turnstile
Use stealth, Cloudflare-solving, and proxy modes only for systems you own or have explicit authorization to test; require user confirmation before these modes are used.
A poorly scoped crawl could unintentionally hit many pages, overload a target, or collect more data than intended.
The MCP examples support resumable, concurrent spider crawls, which are normal for scraping but can propagate many requests across a site if scope, depth, and rate limits are not controlled.
`start_spider` ... `"concurrent_requests": 10` ... `"crawldir": "./crawl_data"`
Set explicit allowed domains, depth limits, delays, and page-count limits before running crawls; prefer polite fetching first.
Installing unpinned dependencies can pull changing third-party code into the agent's execution environment.
The skill relies on unpinned external Python packages and a browser download. This is expected for Scrapling/Playwright, but it is outside the skill package itself.
`pip install scrapling[mcp,playwright]` and `python -m playwright install chromium`
Install in a virtual environment, verify the upstream Scrapling package, and pin dependency versions if using this in a sensitive environment.
Users may have less certainty about exactly which package identity/version they are installing.
The internal metadata does not match the supplied registry identity, which lists slug `scrapling-yoo` and version `1.0.0`. This is a provenance/packaging inconsistency rather than evidence of malicious behavior.
`"slug": "scrapling-web-scraping", "version": "1.1.0"`
Verify the publisher and package provenance before relying on the skill, especially in managed or production environments.
Future scraping runs may reuse stale or target-influenced selector state, and crawl directories may retain scraped data locally.
The skill documents adaptive selector fingerprints and crawl checkpoints that persist locally for reuse. This is purpose-aligned, but saved crawl state can affect future extractions.
`page.css('.product', auto_save=True)` ... `page.css('.product', adaptive=True)` and `ProductSpider(crawldir="./crawl_data")`Use dedicated crawl directories per project, inspect or clear saved state when switching targets, and avoid storing sensitive scraped content unless necessary.
