Scrapling Web Scraping

ReviewAudited by ClawScan on May 1, 2026.

Overview

This is a coherent web-scraping skill, but it includes disclosed anti-bot, proxy, MCP, and persistence features that should only be used on authorized targets.

Install dependencies in an isolated environment, verify the Scrapling/Playwright packages, configure the MCP server only if you trust it, and use stealth/proxy/large-crawl features only with authorization. Review and delete crawl outputs, checkpoints, and session data when they may contain sensitive information.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

If misused, the agent could help access or stress websites in ways that violate site rules or legal restrictions.

Why it was flagged

The skill explicitly teaches anti-bot and Cloudflare-handling workflows. This is disclosed and aligned with the scraping purpose, but it can be misused against sites where the user lacks permission.

Skill content
`fetch_stealthy` — Anti-bot bypass mode ... solve_cloudflare=True,  # Auto-solve Turnstile
Recommendation

Use stealth, proxy, and large-crawl features only for sites you own, are authorized to test, or are explicitly permitted to scrape; respect rate limits and terms of service.

What this means

Installing unpinned packages can pull in newer or compromised dependency versions if the package source is not trusted.

Why it was flagged

The setup asks the user to install external dependencies and browser components without pinned versions. This is expected for Scrapling/Playwright use, but users should verify what they install.

Skill content
pip install scrapling[mcp,playwright]
python -m playwright install chromium
Recommendation

Install in a virtual environment, verify the Scrapling package source, and pin versions for repeatable or production use.

What this means

Local crawl directories or session state may contain data from scraped pages or cookies that should not be kept indefinitely.

Why it was flagged

The skill documents persistent session/cookie state and crawl checkpoint storage. This is useful for scraping but may retain scraped content, selectors, or session data between runs.

Skill content
Use sessions for cookie/state across requests ... **Pause/Resume**: `crawldir` parameter saves checkpoints
Recommendation

Choose crawl directories intentionally, avoid storing sensitive authenticated content unless necessary, and clean up checkpoints, outputs, and session data when finished.

What this means

The agent may send target URLs, fetched HTML, and extraction inputs to the configured MCP server.

Why it was flagged

The skill routes execution through a local Scrapling MCP server. This is central to the design, but target URLs and HTML passed through MCP should be treated as data shared with that tool.

Skill content
"mcpServers": { "scrapling": { "command": "python", "args": ["-m", "scrapling.mcp"] } }
Recommendation

Only configure MCP servers you trust, and avoid sending private or authenticated page content through the tool unless that is intended.