FlowCrawl — Stealth Web Scraper That Bypasses Everything
v1.1.0Stealth web scraper. Give it any URL and it punches through Cloudflare, bot detection, and WAFs automatically using a 3-tier cascade (plain HTTP → TLS spoof...
Security Scan
OpenClaw
Suspicious
medium confidencePurpose & Capability
The name/description (stealth scraper that 'punches through Cloudflare/WAFs') align with the included code and SKILL.md: the CLI uses a three-tier escalation (plain HTTP → stealth/TLS spoof → full JS via Playwright). No unrelated credentials or config are requested. The claim 'No CDP Chrome' is potentially misleading because Playwright and stealth tooling are used—functionally this is a browser-automation based bypass stack, which matches the stated purpose but the marketing is aggressive and possibly inaccurate.
Instruction Scope
SKILL.md instructs the user to pip install scrapling (which will pull Playwright and stealth plugins) and to add an alias to the user's shell rc (~/.zshrc). The runtime instructions and code explicitly escalate to evasion techniques (TLS fingerprint spoofing, stealth plugins, full JS execution) to bypass protections — behavior that intentionally evades server-side defenses and could violate terms of service or laws. The skill does not attempt to read unrelated local files, nor does it exfiltrate data to external endpoints, but it does modify user shell config via the recommended alias and triggers external downloads when installed or run.
Install Mechanism
There is no registry install spec, but SKILL.md requires 'pip install scrapling'. Scrapling will install Playwright and (on first run) download browser binaries — a network-driven install that writes binaries to disk. The lack of a formal install spec in the registry plus the implicit heavy runtime dependency (Playwright/browser downloads) is a practical installation risk and should be made explicit to users. The pip/Playwright download is from public registries, not an unknown URL, but can be large and perform additional network activity.
Credentials
The skill requests no environment variables, no credentials, and no special config paths. That is proportionate to a local scraper tool. There are no declared requirements for unrelated secrets or remote service keys.
Persistence & Privilege
The skill is user-invocable and not 'always: true' (no elevated persistent privilege). However SKILL.md recommends adding an alias to ~/.zshrc which writes to the user's shell config — a mild, user-visible persistence action. Playwright will also place browser artifacts on disk. The skill does not modify other skills or system-wide OpenClaw settings.
What to consider before installing
This skill is coherent with its stated aim of bypassing bot protections, but that purpose is inherently risky and may violate site terms or laws. Before installing: 1) Decide whether evading WAFs/Cloudflare is appropriate and legal for your use case — don’t use on sites you don’t own or without permission. 2) Review the scrapling project source and trustworthiness (pip package + GitHub repo) because installing it will bring Playwright and download browser binaries. 3) Be aware the README suggests modifying ~/.zshrc (adds an alias); only do this if you want that persistent change. 4) Run in an isolated environment (VM/container) if you want to reduce risk of surprising downloads or side effects. 5) If you plan to use this in production or in an automated agent, consider legal/ethical review and logging/limits to avoid abusive scraping. If you want a lower-risk option, prefer tools that respect robots.txt and avoid active fingerprint spoofing.Like a lobster shell, security has layers — review code before you run it.
Runtime requirements
🦞 Clawdis
latest
FlowCrawl
Scrape any website. Bypass any bot protection. Free.
Install Scrapling First
pip install scrapling
Scrapling installs Playwright automatically on first run. That's the only dependency.
Quick Usage
# Single URL — prints clean markdown to stdout
python3 ~/clawd/skills/flowcrawl/scripts/flowcrawl.py https://example.com
# Spider the whole site
python3 ~/clawd/skills/flowcrawl/scripts/flowcrawl.py https://example.com --deep
# Deep crawl with limits, save and combine
python3 ~/clawd/skills/flowcrawl/scripts/flowcrawl.py https://example.com --deep --limit 30 --combine
# JSON output — pipe into anything
python3 ~/clawd/skills/flowcrawl/scripts/flowcrawl.py https://example.com --json
Add Alias (Recommended)
echo 'alias flowcrawl="python3 ~/clawd/skills/flowcrawl/scripts/flowcrawl.py"' >> ~/.zshrc
source ~/.zshrc
Then just: flowcrawl https://example.com
How It Works
FlowCrawl uses a 3-tier fetcher cascade. Starts fast, escalates only when blocked:
| Tier | Method | Handles |
|---|---|---|
| 1 | Plain HTTP | Most sites, instant |
| 2 | Stealth + TLS spoof | Cloudflare, Imperva, basic WAFs |
| 3 | Full JS execution | SPAs, heavy JS, aggressive bot detection |
Auto-detects blocking (403, 503, "Just a moment...") and escalates silently.
All Options
| Flag | Description | Default |
|---|---|---|
--deep | Spider whole site following internal links | off |
--depth N | Max hop depth from start URL | 3 |
--limit N | Max pages to crawl | 50 |
--combine | Merge all pages into one file | off |
--format md|txt | Output format | md |
--output DIR | Output directory | ./flowcrawl-output |
--json | Structured JSON output | off |
--quiet | Suppress progress logs | off |
Comments
Loading comments...
