Firecrawl Local

Knowledge
DocumentationRagDockerLocal First

Use this skill whenever you need to scrape web pages, crawl websites, or map site structure using a self-hosted Firecrawl instance. Triggers on requests to extract web content, build RAG pipelines from docs, bulk ingest documentation, discover site URLs, or get clean markdown from any webpage. Assumes Firecrawl is already running at localhost:3002 (no Docker startup). Falls back gracefully if unavailable. Use even if the user just says "scrape this", "crawl the docs", or "get content from X".

Install

openclaw skills install @saddamtechie/firecrawl-local

Firecrawl Local Skill

Self-hosted Firecrawl integration using the v1 REST API. Tests connectivity first, executes scrape/crawl/map, handles async crawl polling automatically.

Setup (one-time)

mkdir -p ~/.openclaw/skills/firecrawl-local
cp run.sh ~/.openclaw/skills/firecrawl-local/run.sh
chmod +x ~/.openclaw/skills/firecrawl-local/run.sh

The script lives at scripts/run.sh in this skill folder — copy it into place as above.

Prerequisites: curl, jq installed. Firecrawl running at localhost:3002.

Optional env vars:

export FIRECRAWL_LOCAL_URL="http://localhost:3002"  # default
export FIRECRAWL_API_KEY="fc-your-key"              # only needed if auth enabled

Commands

Default — scrape a single page (URL only, no subcommand needed)

firecrawl-local https://docs.example.com/api

Scrape — explicit, with format options

firecrawl-local scrape https://docs.example.com/api
firecrawl-local scrape https://docs.example.com/api --formats markdown,html

Map — discover all URLs on a site

firecrawl-local map https://docs.example.com
firecrawl-local map https://docs.example.com --limit 200

Crawl — bulk extract multiple pages (async, auto-polled)

firecrawl-local crawl https://docs.example.com
firecrawl-local crawl https://docs.example.com --limit 30 --max-depth 2
firecrawl-local crawl https://docs.example.com --include /docs --exclude /blog

Agent Instructions

When to use each command

GoalCommand
Get content from one URL (quickest)firecrawl-local <url>
Discover what pages existmap
Get content from one URL with format controlscrape
Ingest an entire docs sitecrawl
RAG pipeline ingestionmap → targeted scrape or crawl

Optimal workflows

Documentation RAG pipeline:

1. map https://docs.example.com          → get full URL list
2. scrape <specific key pages>           → targeted extraction
3. Pass markdown to embedding pipeline

Full site ingestion:

1. crawl https://docs.example.com --limit 50 --max-depth 3
2. Results auto-polled and returned as JSON array of {url, markdown}

Parameters

FlagApplies toDescription
--limit Nmap, crawlMax pages (default: 50 for crawl, 500 for map)
--max-depth NcrawlHow deep to follow links (default: 2)
--include /pathcrawlOnly crawl URLs matching this path prefix
--exclude /pathcrawlSkip URLs matching this path prefix
--formats listscrapeComma-separated: markdown, html, rawHtml, links

Reading the output

  • scrape: Returns {success, data: {markdown, html, metadata}}
  • map: Returns {success, links: [...]}
  • crawl: Returns {success, data: [{url, markdown, metadata}, ...]} ← after polling completes

Failure signals and fixes

ErrorCauseFix
Local Firecrawl unavailableService not runningStart Firecrawl, check port 3002
success: falseBad URL or blockedCheck URL is reachable, try --formats html
Empty markdown fieldJS-rendered pageFirecrawl handles most JS — check if site blocks bots
Crawl times outSite is largeReduce --limit or --max-depth

Script reference

See scripts/run.sh for the full implementation. Key design decisions:

  • Health check uses /health endpoint with 3s timeout
  • Auth header only sent when FIRECRAWL_API_KEY is set
  • Crawl polling retries every 5s up to 60 attempts (5 minutes)
  • All parameters are passed via jq to prevent shell injection in JSON