Scrapling Yoo

WarnAudited by ClawScan on May 10, 2026.

Overview

The skill is a coherent Scrapling web-scraping guide, but it explicitly enables stealth scraping, Cloudflare/Turnstile bypass, proxy rotation, and broad crawls that need careful authorization.

Install only if you intentionally need Scrapling-based web scraping. Use it only on sites you own or are authorized to crawl, avoid stealth/proxy/Cloudflare-solving modes unless explicitly permitted, run dependencies in an isolated virtual environment, pin or verify packages, and set crawl limits, delays, and dedicated output/checkpoint directories.

Findings (5)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Concern

ASI02: Tool Misuse and Exploitation

What this means

An agent could help scrape protected sites or bypass bot controls, which may violate site rules, laws, contracts, or trigger account/IP blocking if used without permission.

Why it was flagged

The skill explicitly guides the agent toward stealth browser automation and solving bot-protection challenges, which is materially riskier than normal page fetching even though some reference docs mention permissioned use.

Skill content

`fetch_stealthy` — Anti-bot bypass mode ... `solve_cloudflare=True`  # Auto-solve Turnstile

Recommendation

Use stealth, Cloudflare-solving, and proxy modes only for systems you own or have explicit authorization to test; require user confirmation before these modes are used.

Note

ASI08: Cascading Failures

What this means

A poorly scoped crawl could unintentionally hit many pages, overload a target, or collect more data than intended.

Why it was flagged

The MCP examples support resumable, concurrent spider crawls, which are normal for scraping but can propagate many requests across a site if scope, depth, and rate limits are not controlled.

Skill content

`start_spider` ... `"concurrent_requests": 10` ... `"crawldir": "./crawl_data"`

Recommendation

Set explicit allowed domains, depth limits, delays, and page-count limits before running crawls; prefer polite fetching first.

Note

ASI04: Agentic Supply Chain Vulnerabilities

What this means

Installing unpinned dependencies can pull changing third-party code into the agent's execution environment.

Why it was flagged

The skill relies on unpinned external Python packages and a browser download. This is expected for Scrapling/Playwright, but it is outside the skill package itself.

Skill content

`pip install scrapling[mcp,playwright]` and `python -m playwright install chromium`

Recommendation

Install in a virtual environment, verify the upstream Scrapling package, and pin dependency versions if using this in a sensitive environment.

Note

ASI04: Agentic Supply Chain Vulnerabilities

What this means

Users may have less certainty about exactly which package identity/version they are installing.

Why it was flagged

The internal metadata does not match the supplied registry identity, which lists slug `scrapling-yoo` and version `1.0.0`. This is a provenance/packaging inconsistency rather than evidence of malicious behavior.

Skill content

`"slug": "scrapling-web-scraping", "version": "1.1.0"`

Recommendation

Verify the publisher and package provenance before relying on the skill, especially in managed or production environments.

Note

ASI06: Memory and Context Poisoning

What this means

Future scraping runs may reuse stale or target-influenced selector state, and crawl directories may retain scraped data locally.

Why it was flagged

The skill documents adaptive selector fingerprints and crawl checkpoints that persist locally for reuse. This is purpose-aligned, but saved crawl state can affect future extractions.

Skill content

`page.css('.product', auto_save=True)` ... `page.css('.product', adaptive=True)` and `ProductSpider(crawldir="./crawl_data")`

Recommendation

Use dedicated crawl directories per project, inspect or clear saved state when switching targets, and avoid storing sensitive scraped content unless necessary.