Install
openclaw skills install scrapclawRun Scrapclaw as a Dockerized browser-backed scraping service, then use this skill to fetch HTML from JavaScript-heavy or Cloudflare-protected pages through its HTTP API.
openclaw skills install scrapclawUse this skill when the user needs raw HTML from a page that may require a real browser, waiting for JavaScript, or Cloudflare solving, and when they want a self-hosted Docker container they can run locally or on a server. Do not use it for simple static pages that are easier to fetch directly.
This repo includes both:
Preferred: run the published Docker image from GitHub Container Registry:
docker run --rm -d \
--name scrapclaw \
-p 8192:8192 \
ghcr.io/ericpearson/scrapclaw:v0.0.6
The same image is referenced by the GitHub v0.0.6 release for this repo.
If you use the source build path instead of the published image, review the repo, Dockerfile, and docker-compose.yml first. Running docker compose up --build on unreviewed code can execute arbitrary code on the host.
If you want to run from source instead, use Docker Compose:
git clone https://github.com/ericpearson/scrapclaw.git
cd scrapclaw
docker compose up --build -d
The API will be available at http://127.0.0.1:8192.
If you are unsure about the target pages or host environment, prefer running the container on an isolated VM or similarly restricted host.
Install the local skill into an OpenClaw workspace:
mkdir -p ~/.openclaw/workspace/skills
cp -R skills/scrapclaw ~/.openclaw/workspace/skills/
Or install it from ClawHub:
clawhub install scrapclaw --version 0.0.6
SCRAPCLAW_BASE_URL if it is set.http://127.0.0.1:8192.SCRAPCLAW_API_TOKEN is set, include Authorization: Bearer $SCRAPCLAW_API_TOKEN.SCRAPCLAW_API_TOKEN as sensitive and only use it when the user or operator intentionally configured it.GET /health before making a scrape request when service availability is unknown.POST /v1 with JSON containing:
url: required target URLmaxTimeout: timeout in milliseconds, default 60000wait: extra post-navigation wait in milliseconds, default 0cmd: must be request.getresponseMode: html for raw markup or text for extracted readable text, default htmlmaxResponseBytes: optional UTF-8 byte cap for solution.response"status": "error", surface the error clearly and stop."status": "ok", use solution.response as the fetched HTML or extracted text, solution.status as the upstream HTTP status, and solution.title when page title context helps.Health check:
curl -fsS "${SCRAPCLAW_BASE_URL:-http://127.0.0.1:8192}/health"
Fetch a page:
auth_args=()
if [ -n "${SCRAPCLAW_API_TOKEN:-}" ]; then
auth_args=(-H "Authorization: Bearer ${SCRAPCLAW_API_TOKEN}")
fi
curl -fsS "${SCRAPCLAW_BASE_URL:-http://127.0.0.1:8192}/v1" \
-H 'Content-Type: application/json' \
"${auth_args[@]}" \
-d '{"url":"https://example.com","maxTimeout":60000,"wait":0,"cmd":"request.get","responseMode":"html","maxResponseBytes":50000}'