Crawl4AI Web Scraper
v1.0.1Full web page scraping with JavaScript rendering via local Crawl4AI instance, delivering clean markdown or detailed JSON including links and media.
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name/description match what the files do: the SKILL.md and included Node script send URLs to a Crawl4AI instance and return markdown/JSON. Declared required binary (node) is appropriate. One minor inconsistency: registry metadata earlier listed no required env vars, but SKILL.md (and the script) require CRAWL4AI_URL (and optionally CRAWL4AI_KEY). This appears to be a metadata omission rather than malicious.
Instruction Scope
Instructions and the script remain within scope: they POST {urls: [...] } to the configured Crawl4AI URL and print returned markdown/JSON. The script only reads CRAWL4AI_URL and optional CRAWL4AI_KEY. Notes: the script uses Node's http module (not https) which means HTTPS endpoints will not be handled correctly; the help text mentions a default API key of '1234' (harmless but confusing). There are no instructions to read unrelated files or environment variables.
Install Mechanism
No install spec or external downloads; the skill is instruction+embedded script only. No third-party packages are fetched at install time, so install risk is low.
Credentials
Only CRAWL4AI_URL (required) and CRAWL4AI_KEY (optional) are used. These directly map to the stated purpose (target instance URL and optional auth). No unrelated secrets or config paths are requested.
Persistence & Privilege
Skill does not request always:true and does not attempt to modify other skills or system-wide settings. Agent-autonomous invocation is allowed by default (normal for skills) but not specially privileged here.
Assessment
This skill appears to do what it claims, but review these points before installing:
- You must set CRAWL4AI_URL to a trusted Crawl4AI endpoint (for local usage set it to http://localhost:11235). The skill will send the URL(s) you want scraped to that endpoint, so pointing it at an untrusted remote service could leak the pages you request and any scraped content.
- The script only supports plain HTTP (uses Node's http module). If you provide an https:// URL the script may fail; treat that as an implementation bug, not a security feature.
- The only environment variables used are CRAWL4AI_URL and optional CRAWL4AI_KEY. If your Crawl4AI instance requires a key, use a secret you trust. There are no other hidden env reads or file accesses.
- Metadata mismatch: the registry metadata omitted the required env var that the SKILL.md and script require. This is likely a packaging oversight—confirm CRAWL4AI_URL will be supplied before use.
- If you plan to let the agent invoke skills autonomously, remember the agent could call this skill and thereby send target URLs to the configured Crawl4AI endpoint; ensure that behavior is acceptable in your environment.
Overall: coherent and low-risk provided you point it to a trusted Crawl4AI instance and are aware of the HTTP/HTTPS limitation.Like a lobster shell, security has layers — review code before you run it.
Runtime requirements
🕷️ Clawdis
Binsnode
latest
Crawl4AI Web Scraper
Local Crawl4AI instance for full web page extraction with JavaScript rendering.
Endpoints
Proxy (port 11234) — Clean output, OpenWebUI-compatible
- Returns:
[{page_content, metadata}] - Use for: Simple content extraction
Direct (port 11235) — Full output with all data
- Returns:
{results: [{markdown, html, links, media, ...}]} - Use for: When you need links, media, or other metadata
Usage
# Via script
node {baseDir}/scripts/crawl4ai.js "url"
node {baseDir}/scripts/crawl4ai.js "url" --json
Script options:
--json— Full JSON response
Output: Clean markdown from the page.
Configuration
Required environment variable:
CRAWL4AI_URL— Your Crawl4AI instance URL (e.g.,http://localhost:11235)
Optional:
CRAWL4AI_KEY— API key if your instance requires authentication
Features
- JavaScript rendering — Handles dynamic content
- Unlimited usage — Local instance, no API limits
- Full content — HTML, markdown, links, media, tables
- Better than Tavily for complex pages with JS
API
Uses your local Crawl4AI instance REST API. Auth header only sent if CRAWL4AI_KEY is set.
Comments
Loading comments...
