Crawl4AI Web Crawler

ReviewAudited by ClawScan on May 12, 2026.

Overview

The skill is coherent for web crawling, but it documents use of browser profiles, cookies, auth-state files, external LLM keys, and anti-detection crawling without clear guardrails.

Review before installing. Use a sandbox or virtual environment, pin trusted dependency versions when possible, set strict crawl limits, and avoid using your main browser profile or personal cookies. For private or authenticated pages, use a temporary account/profile and confirm whether content will be sent to an external LLM provider.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

ConcernMedium Confidence
ASI03: Identity and Privilege Abuse
What this means

If used carelessly, the crawler could access or extract content from logged-in websites using the user's existing session.

Why it was flagged

The skill documents ways to reuse local browser profiles, cookies, and authentication-state files. That can be purpose-aligned for authenticated crawling, but it is high-impact access and the artifacts do not clearly bound when to use it, which accounts it may access, or what user approval is required.

Skill content
user_data_dir: Optional[str] = None,    # Browser profile path ... cookies: list = [],                     # Pre-set cookies ... storage_state: Optional[str] = None,    # Authentication state file path
Recommendation

Only use dedicated, temporary browser profiles or explicitly provided cookies/auth-state files for a specific site and task. Do not point the crawler at a personal main browser profile unless the user explicitly understands the impact.

What this means

The agent could crawl more pages than intended, run browser-side automation, or violate site rules if targets and limits are not set carefully.

Why it was flagged

These are expected crawler capabilities, but they allow broad crawling, page automation, anti-detection behavior, and crawling without robots.txt checks unless configured otherwise.

Skill content
js_code: Optional[list] = None,               # List of JS code snippets to execute ... magic: bool = False,                          # Automatic anti-detection ... check_robots_txt: bool = False ... deep_crawl_strategy: Optional = None
Recommendation

Set explicit target URLs, max depth/pages, rate limits, and robots/terms-of-service expectations before crawling. Avoid anti-detection settings unless the user has a legitimate reason and permission.

What this means

Private or sensitive page content could be sent to a third-party LLM provider if LLM extraction is used on authenticated or confidential pages.

Why it was flagged

The skill supports sending extracted or crawled content to external LLM providers using API keys. This is expected for LLM extraction, but users should understand the data-flow boundary.

Skill content
LLMConfig(provider="openai/gpt-4o-mini",     # Also supports ollama/llama3, anthropic/claude-3, etc.
        api_token="your-api-key")
Recommendation

Use local models for sensitive pages when possible, or confirm that the selected provider and API key are appropriate for the data being processed.

What this means

Installing the dependency gives third-party package code and browser components access to the local environment.

Why it was flagged

The documented setup installs an unpinned Python package and browser dependencies. This is normal for Crawl4AI usage, but it means trust shifts to the PyPI package and Playwright installation process.

Skill content
pip install -U crawl4ai
crawl4ai-setup          # Automatically installs the Playwright browser
...
python -m playwright install --with-deps chromium
Recommendation

Install only in a trusted, isolated environment, consider pinning package versions, and review Crawl4AI/Playwright provenance before use.