Scrapling Yoo

PassAudited by VirusTotal on May 11, 2026.

Overview

Type: OpenClaw Skill Name: scrapling-yoo Version: 1.0.0 This skill bundle provides advanced web scraping capabilities via the Scrapling library, including fetching arbitrary URLs, browser automation, anti-bot bypass (e.g., Cloudflare/Turnstile solving), and proxy rotation, as demonstrated in `scripts/scrapling_scrape.py` and `scripts/scrapling_smoke_test.py`. While the `SKILL.md` and `references/anti-bot.md` contain explicit 'Guardrails' and warnings against unauthorized use, the inherent power of these features presents a significant risk of misuse for unauthorized data collection or denial-of-service activities. There is no evidence of intentional malicious behavior (e.g., data exfiltration to unauthorized endpoints, persistence mechanisms) within the code itself, but the high-risk capabilities warrant a 'suspicious' classification.

Findings (0)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

ConcernHigh Confidence

ASI02: Tool Misuse and Exploitation

What this means

An agent could help scrape protected sites or bypass bot controls, which may violate site rules, laws, contracts, or trigger account/IP blocking if used without permission.

Why it was flagged

The skill explicitly guides the agent toward stealth browser automation and solving bot-protection challenges, which is materially riskier than normal page fetching even though some reference docs mention permissioned use.

Skill content

`fetch_stealthy` — Anti-bot bypass mode ... `solve_cloudflare=True`  # Auto-solve Turnstile

Recommendation

Use stealth, Cloudflare-solving, and proxy modes only for systems you own or have explicit authorization to test; require user confirmation before these modes are used.

NoteHigh Confidence

ASI08: Cascading Failures

What this means

A poorly scoped crawl could unintentionally hit many pages, overload a target, or collect more data than intended.

Why it was flagged

The MCP examples support resumable, concurrent spider crawls, which are normal for scraping but can propagate many requests across a site if scope, depth, and rate limits are not controlled.

Skill content

`start_spider` ... `"concurrent_requests": 10` ... `"crawldir": "./crawl_data"`

Recommendation

Set explicit allowed domains, depth limits, delays, and page-count limits before running crawls; prefer polite fetching first.

NoteHigh Confidence

ASI04: Agentic Supply Chain Vulnerabilities

What this means

Installing unpinned dependencies can pull changing third-party code into the agent's execution environment.

Why it was flagged

The skill relies on unpinned external Python packages and a browser download. This is expected for Scrapling/Playwright, but it is outside the skill package itself.

Skill content

`pip install scrapling[mcp,playwright]` and `python -m playwright install chromium`

Recommendation

Install in a virtual environment, verify the upstream Scrapling package, and pin dependency versions if using this in a sensitive environment.

NoteHigh Confidence

ASI04: Agentic Supply Chain Vulnerabilities

What this means

Users may have less certainty about exactly which package identity/version they are installing.

Why it was flagged

The internal metadata does not match the supplied registry identity, which lists slug `scrapling-yoo` and version `1.0.0`. This is a provenance/packaging inconsistency rather than evidence of malicious behavior.

Skill content

`"slug": "scrapling-web-scraping", "version": "1.1.0"`

Recommendation

Verify the publisher and package provenance before relying on the skill, especially in managed or production environments.

NoteHigh Confidence

ASI06: Memory and Context Poisoning

What this means

Future scraping runs may reuse stale or target-influenced selector state, and crawl directories may retain scraped data locally.

Why it was flagged

The skill documents adaptive selector fingerprints and crawl checkpoints that persist locally for reuse. This is purpose-aligned, but saved crawl state can affect future extractions.

Skill content

`page.css('.product', auto_save=True)` ... `page.css('.product', adaptive=True)` and `ProductSpider(crawldir="./crawl_data")`

Recommendation

Use dedicated crawl directories per project, inspect or clear saved state when switching targets, and avoid storing sensitive scraped content unless necessary.