Scrapling Web Scraping
ReviewAudited by ClawScan on May 1, 2026.
Overview
This is a coherent web-scraping skill, but it includes disclosed anti-bot, proxy, MCP, and persistence features that should only be used on authorized targets.
Install dependencies in an isolated environment, verify the Scrapling/Playwright packages, configure the MCP server only if you trust it, and use stealth/proxy/large-crawl features only with authorization. Review and delete crawl outputs, checkpoints, and session data when they may contain sensitive information.
Findings (4)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
If misused, the agent could help access or stress websites in ways that violate site rules or legal restrictions.
The skill explicitly teaches anti-bot and Cloudflare-handling workflows. This is disclosed and aligned with the scraping purpose, but it can be misused against sites where the user lacks permission.
`fetch_stealthy` — Anti-bot bypass mode ... solve_cloudflare=True, # Auto-solve Turnstile
Use stealth, proxy, and large-crawl features only for sites you own, are authorized to test, or are explicitly permitted to scrape; respect rate limits and terms of service.
Installing unpinned packages can pull in newer or compromised dependency versions if the package source is not trusted.
The setup asks the user to install external dependencies and browser components without pinned versions. This is expected for Scrapling/Playwright use, but users should verify what they install.
pip install scrapling[mcp,playwright] python -m playwright install chromium
Install in a virtual environment, verify the Scrapling package source, and pin versions for repeatable or production use.
Local crawl directories or session state may contain data from scraped pages or cookies that should not be kept indefinitely.
The skill documents persistent session/cookie state and crawl checkpoint storage. This is useful for scraping but may retain scraped content, selectors, or session data between runs.
Use sessions for cookie/state across requests ... **Pause/Resume**: `crawldir` parameter saves checkpoints
Choose crawl directories intentionally, avoid storing sensitive authenticated content unless necessary, and clean up checkpoints, outputs, and session data when finished.
The agent may send target URLs, fetched HTML, and extraction inputs to the configured MCP server.
The skill routes execution through a local Scrapling MCP server. This is central to the design, but target URLs and HTML passed through MCP should be treated as data shared with that tool.
"mcpServers": { "scrapling": { "command": "python", "args": ["-m", "scrapling.mcp"] } }Only configure MCP servers you trust, and avoid sending private or authenticated page content through the tool unless that is intended.
