Firecrawl Web Scraper

Web scraping, crawling, and search via Firecrawl API. Converts web pages to clean markdown/JSON optimized for AI consumption. Use when you need to extract co...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 152 · 0 current installs · 0 all-time installs
byPatrick@Moochmaniac
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
!
Purpose & Capability
Name/description align with the code: the Python script talks to api.firecrawl.dev for scrape/crawl/search operations. However the registry metadata declares no required credentials or config paths while both SKILL.md and the script clearly require a Firecrawl API key and reference specific secret file locations (workspace/secrets/firecrawl_api_key, secrets/firecrawl_api_key, and a ~/.openclaw/... path). The missing declaration of that credential/config path is incoherent with the skill's purpose.
!
Instruction Scope
SKILL.md tells the agent to run scripts/scrape.py for scraping/crawling/searching, which is appropriate. But the runtime instructions and script explicitly look up secret files in 'workspace/secrets/...' and in the user's home directory for an API key. The script will send scraped URLs and related payloads to https://api.firecrawl.dev — expected — but it also reads local filesystem paths for secrets that were not declared in the registry, which broadens the skill's runtime access beyond what the manifest claims.
Install Mechanism
This is instruction-only with an included Python script and no install spec or external downloads. There is no package install or archive extraction. The script depends on the 'requests' module but no installer is invoked by the skill itself.
!
Credentials
The code requires a Firecrawl API key (it checks FIRECRAWL_API_KEY env var and several secret file paths), yet the registry metadata lists no required environment variables or primary credential. Requesting access to workspace secret file paths and a home-directory path without declaring them is disproportionate to what the manifest states and could unintentionally expose workspace secrets if those files exist.
Persistence & Privilege
The skill is not 'always: true' and does not attempt to modify other skills or system-wide settings. It only reads specified file paths and environment variables at runtime. Autonomous invocation is enabled (default) — expected for skills — but that alone is not flagged.
What to consider before installing
This skill wraps the Firecrawl web-scraping API and will send URLs and scrape requests to https://api.firecrawl.dev, which is expected behaviour. Before installing: (1) verify you trust the publisher and the Firecrawl endpoint; (2) note the skill will look for an API key in FIRECRAWL_API_KEY or in files like workspace/secrets/firecrawl_api_key and a ~/.openclaw/... path — but the registry did not declare these secrets. Prefer providing the API key via an explicitly scoped secret (and avoid placing other sensitive secrets in those file paths); (3) inspect the script (which you've got) to confirm it only sends scrape targets and not arbitrary local files; (4) test with non-sensitive targets and a rotated/test API key to confirm behavior and credits used; (5) ask the publisher to update the manifest to declare the required env var and config paths so the access is transparent. If you can't confirm the publisher or the manifest, treat the skill cautiously and avoid putting production/privileged secrets in the referenced secret files.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk97b569gj929hx1x03xfkpt15s82fdpg

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

Firecrawl Scraper

Professional web scraping powered by Firecrawl API. Converts websites to clean, AI-ready markdown or structured JSON.

When to Use

  • Extract content from web pages for analysis
  • Scrape documentation sites or knowledge bases
  • Crawl entire websites systematically
  • Search the web and get scraped content
  • Parse JavaScript-heavy or dynamic sites
  • Convert HTML to clean markdown for LLM processing
  • Competitive research or content aggregation

Quick Start

Scrape a single page:

python3 scripts/scrape.py https://example.com

Crawl a website:

python3 scripts/scrape.py --crawl https://docs.example.com --depth 2 --limit 10

Search and scrape:

python3 scripts/scrape.py --search "AI agent frameworks" --limit 5

Check crawl status:

python3 scripts/scrape.py --crawl-status abc123

Commands

Scrape (Single Page)

Extract content from a single URL:

python3 scripts/scrape.py <url> [options]

Options:

  • --formats markdown,html,screenshot — Output formats (default: markdown)
  • --full — Include full page (no main content extraction)
  • --json — Output raw JSON response

Examples:

# Basic scrape
python3 scripts/scrape.py https://docs.example.com

# Get HTML and markdown
python3 scripts/scrape.py https://site.com --formats markdown,html

# Full page (no content filtering)
python3 scripts/scrape.py https://site.com --full

# JSON output
python3 scripts/scrape.py https://site.com --json

Crawl (Entire Website)

Systematically crawl and scrape multiple pages:

python3 scripts/scrape.py --crawl <url> [options]

Options:

  • --depth N — Maximum crawl depth (default: 2)
  • --limit N — Maximum pages to crawl (default: 10)
  • --json — Output raw JSON response

Examples:

# Basic crawl
python3 scripts/scrape.py --crawl https://docs.site.com

# Deep crawl with limit
python3 scripts/scrape.py --crawl https://blog.com --depth 3 --limit 50

# Shallow crawl
python3 scripts/scrape.py --crawl https://site.com --depth 1 --limit 5

Note: Crawl returns a job ID. Use --crawl-status to check progress and retrieve results.

Search (Web Search + Scrape)

Search the web and get scraped content from results:

python3 scripts/scrape.py --search <query> [options]

Options:

  • --limit N — Number of results (default: 5)
  • --json — Output raw JSON response

Examples:

# Search and scrape
python3 scripts/scrape.py --search "WordPress security best practices"

# More results
python3 scripts/scrape.py --search "AI agents 2026" --limit 10

# JSON output
python3 scripts/scrape.py --search "casino bonuses" --json

Crawl Status

Check status of a crawl job:

python3 scripts/scrape.py --crawl-status <job-id>

Returns JSON with:

  • Status: scraping, completed, failed
  • Progress: Pages scraped
  • Data: Scraped content (when completed)
  • Credits used

Output Formats

Markdown (default): Clean, LLM-ready text with preserved structure

  • Headings, links, lists, code blocks maintained
  • No HTML noise or styling artifacts
  • Perfect for RAG, summarization, analysis

HTML: Full HTML source (useful for parsing specific elements)

Screenshot: Base64-encoded PNG of rendered page

JSON: Structured data extraction (custom schemas supported)

Features

Smart Content Extraction:

  • Automatically identifies main content
  • Removes navigation, ads, footers
  • Preserves document structure

JavaScript Support:

  • Handles SPAs (React, Vue, Angular)
  • Waits for dynamic content to load
  • 96% web coverage

Anti-Bot Handling:

  • Proxy management built-in
  • Rate limiting handled automatically
  • CAPTCHA avoidance

Caching:

  • Smart caching reduces credits
  • Configurable cache behavior

API Key Setup

The script looks for the Firecrawl API key in:

  1. workspace/secrets/firecrawl_api_key (OpenClaw workspace)
  2. secrets/firecrawl_api_key (relative to current directory)
  3. FIRECRAWL_API_KEY environment variable

Current key is stored at: workspace/secrets/firecrawl_api_key

Credits & Pricing

  • Scrape: 1 credit per page
  • Crawl: 1 credit per page crawled
  • Search: 1 credit per result scraped
  • Screenshot: +1 credit
  • Advanced features: May use additional credits

Free tier: 500 credits
Paid plans: Starting at $16/month (3,000 credits)

Use Cases

Documentation Extraction:

python3 scripts/scrape.py --crawl https://docs.framework.com --depth 2 --limit 50

Competitive Research:

python3 scripts/scrape.py --search "top casino affiliate sites" --limit 10

Content Migration:

python3 scripts/scrape.py https://old-site.com/page1 --formats markdown

News Monitoring:

python3 scripts/scrape.py --search "WordPress security updates" --limit 5

Blog Scraping:

python3 scripts/scrape.py --crawl https://blog.site.com --depth 1 --limit 20

Tips

  • Start with low --limit values to test
  • Use --depth 1 for blog homepages (gets all posts)
  • --depth 2-3 for documentation sites
  • --search is faster than manual crawling for research
  • Check --crawl-status regularly for long crawls
  • Use --json for programmatic processing
  • Markdown format is best for LLM consumption

Comparison to Other Tools

vs web_fetch tool:

  • Firecrawl: Better JS support, cleaner output, handles complex sites
  • web_fetch: Faster, simpler, no API credits needed
  • Use Firecrawl for: Modern sites, heavy JS, need high-quality markdown
  • Use web_fetch for: Simple pages, quick checks, no credit usage

vs browser tool:

  • Firecrawl: Optimized for scraping, structured output, no browser management
  • browser: Full control, visual interaction, debugging
  • Use Firecrawl for: Content extraction at scale
  • Use browser for: Interactive tasks, testing, visual verification

Files

2 total
Select a file
Select a file to preview.

Comments

Loading comments…