Firecrawl Web Scraper

Web scraping, crawling, and search via Firecrawl API. Converts web pages to clean markdown/JSON optimized for AI consumption. Use when you need to extract co...

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 0 · 152 · 0 current installs · 0 all-time installs

byPatrick@Moochmaniac

MIT-0

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

Purpose & Capability

Name/description align with the code: the Python script talks to api.firecrawl.dev for scrape/crawl/search operations. However the registry metadata declares no required credentials or config paths while both SKILL.md and the script clearly require a Firecrawl API key and reference specific secret file locations (workspace/secrets/firecrawl_api_key, secrets/firecrawl_api_key, and a ~/.openclaw/... path). The missing declaration of that credential/config path is incoherent with the skill's purpose.

Instruction Scope

SKILL.md tells the agent to run scripts/scrape.py for scraping/crawling/searching, which is appropriate. But the runtime instructions and script explicitly look up secret files in 'workspace/secrets/...' and in the user's home directory for an API key. The script will send scraped URLs and related payloads to https://api.firecrawl.dev — expected — but it also reads local filesystem paths for secrets that were not declared in the registry, which broadens the skill's runtime access beyond what the manifest claims.

✓

Install Mechanism

This is instruction-only with an included Python script and no install spec or external downloads. There is no package install or archive extraction. The script depends on the 'requests' module but no installer is invoked by the skill itself.

Credentials

The code requires a Firecrawl API key (it checks FIRECRAWL_API_KEY env var and several secret file paths), yet the registry metadata lists no required environment variables or primary credential. Requesting access to workspace secret file paths and a home-directory path without declaring them is disproportionate to what the manifest states and could unintentionally expose workspace secrets if those files exist.

✓

Persistence & Privilege

The skill is not 'always: true' and does not attempt to modify other skills or system-wide settings. It only reads specified file paths and environment variables at runtime. Autonomous invocation is enabled (default) — expected for skills — but that alone is not flagged.

What to consider before installing

This skill wraps the Firecrawl web-scraping API and will send URLs and scrape requests to https://api.firecrawl.dev, which is expected behaviour. Before installing: (1) verify you trust the publisher and the Firecrawl endpoint; (2) note the skill will look for an API key in FIRECRAWL_API_KEY or in files like workspace/secrets/firecrawl_api_key and a ~/.openclaw/... path — but the registry did not declare these secrets. Prefer providing the API key via an explicitly scoped secret (and avoid placing other sensitive secrets in those file paths); (3) inspect the script (which you've got) to confirm it only sends scrape targets and not arbitrary local files; (4) test with non-sensitive targets and a rotated/test API key to confirm behavior and credits used; (5) ask the publisher to update the manifest to declare the required env var and config paths so the access is transparent. If you can't confirm the publisher or the manifest, treat the skill cautiously and avoid putting production/privileged secrets in the referenced secret files.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0

Download zip

latestvk97b569gj929hx1x03xfkpt15s82fdpg

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

SKILL.md

Firecrawl Scraper

Professional web scraping powered by Firecrawl API. Converts websites to clean, AI-ready markdown or structured JSON.

When to Use

Extract content from web pages for analysis
Scrape documentation sites or knowledge bases
Crawl entire websites systematically
Search the web and get scraped content
Parse JavaScript-heavy or dynamic sites
Convert HTML to clean markdown for LLM processing
Competitive research or content aggregation

Quick Start

Scrape a single page:

python3 scripts/scrape.py https://example.com

Crawl a website:

python3 scripts/scrape.py --crawl https://docs.example.com --depth 2 --limit 10

Search and scrape:

python3 scripts/scrape.py --search "AI agent frameworks" --limit 5

Check crawl status:

python3 scripts/scrape.py --crawl-status abc123

Commands

Scrape (Single Page)

Extract content from a single URL:

python3 scripts/scrape.py <url> [options]

Options:

--formats markdown,html,screenshot — Output formats (default: markdown)
--full — Include full page (no main content extraction)
--json — Output raw JSON response

Examples:

# Basic scrape
python3 scripts/scrape.py https://docs.example.com

# Get HTML and markdown
python3 scripts/scrape.py https://site.com --formats markdown,html

# Full page (no content filtering)
python3 scripts/scrape.py https://site.com --full

# JSON output
python3 scripts/scrape.py https://site.com --json

Crawl (Entire Website)

Systematically crawl and scrape multiple pages:

python3 scripts/scrape.py --crawl <url> [options]

Options:

--depth N — Maximum crawl depth (default: 2)
--limit N — Maximum pages to crawl (default: 10)
--json — Output raw JSON response

Examples:

# Basic crawl
python3 scripts/scrape.py --crawl https://docs.site.com

# Deep crawl with limit
python3 scripts/scrape.py --crawl https://blog.com --depth 3 --limit 50

# Shallow crawl
python3 scripts/scrape.py --crawl https://site.com --depth 1 --limit 5

Note: Crawl returns a job ID. Use --crawl-status to check progress and retrieve results.

Search (Web Search + Scrape)

Search the web and get scraped content from results:

python3 scripts/scrape.py --search <query> [options]

Options:

--limit N — Number of results (default: 5)
--json — Output raw JSON response

Examples:

# Search and scrape
python3 scripts/scrape.py --search "WordPress security best practices"

# More results
python3 scripts/scrape.py --search "AI agents 2026" --limit 10

# JSON output
python3 scripts/scrape.py --search "casino bonuses" --json

Crawl Status

Check status of a crawl job:

python3 scripts/scrape.py --crawl-status <job-id>

Returns JSON with:

Status: scraping, completed, failed
Progress: Pages scraped
Data: Scraped content (when completed)
Credits used

Output Formats

Markdown (default): Clean, LLM-ready text with preserved structure

Headings, links, lists, code blocks maintained
No HTML noise or styling artifacts
Perfect for RAG, summarization, analysis

HTML: Full HTML source (useful for parsing specific elements)

Screenshot: Base64-encoded PNG of rendered page

JSON: Structured data extraction (custom schemas supported)

Features

Smart Content Extraction:

Automatically identifies main content
Removes navigation, ads, footers
Preserves document structure

JavaScript Support:

Handles SPAs (React, Vue, Angular)
Waits for dynamic content to load
96% web coverage

Anti-Bot Handling:

Proxy management built-in
Rate limiting handled automatically
CAPTCHA avoidance

Caching:

Smart caching reduces credits
Configurable cache behavior

API Key Setup

The script looks for the Firecrawl API key in:

workspace/secrets/firecrawl_api_key (OpenClaw workspace)
secrets/firecrawl_api_key (relative to current directory)
FIRECRAWL_API_KEY environment variable

Current key is stored at: workspace/secrets/firecrawl_api_key

Credits & Pricing

Scrape: 1 credit per page
Crawl: 1 credit per page crawled
Search: 1 credit per result scraped
Screenshot: +1 credit
Advanced features: May use additional credits

Free tier: 500 credits
Paid plans: Starting at $16/month (3,000 credits)

Use Cases

Documentation Extraction:

python3 scripts/scrape.py --crawl https://docs.framework.com --depth 2 --limit 50

Competitive Research:

python3 scripts/scrape.py --search "top casino affiliate sites" --limit 10

Content Migration:

python3 scripts/scrape.py https://old-site.com/page1 --formats markdown

News Monitoring:

python3 scripts/scrape.py --search "WordPress security updates" --limit 5

Blog Scraping:

python3 scripts/scrape.py --crawl https://blog.site.com --depth 1 --limit 20

Tips

Start with low --limit values to test
Use --depth 1 for blog homepages (gets all posts)
--depth 2-3 for documentation sites
--search is faster than manual crawling for research
Check --crawl-status regularly for long crawls
Use --json for programmatic processing
Markdown format is best for LLM consumption

Comparison to Other Tools

vs web_fetch tool:

Firecrawl: Better JS support, cleaner output, handles complex sites
web_fetch: Faster, simpler, no API credits needed
Use Firecrawl for: Modern sites, heavy JS, need high-quality markdown
Use web_fetch for: Simple pages, quick checks, no credit usage

vs browser tool:

Firecrawl: Optimized for scraping, structured output, no browser management
browser: Full control, visual interaction, debugging
Use Firecrawl for: Content extraction at scale
Use browser for: Interactive tasks, testing, visual verification

Files

2 total

Select a file

Select a file to preview.

Comments

Loading comments…