XCrawl Crawl

Use this skill for XCrawl crawl tasks, including bulk site crawling, crawler rule design, async status polling, and delivery of crawl output for downstream s...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
1 · 96 · 0 current installs · 0 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (XCrawl crawl) align with instructions: it calls XCrawl endpoints, needs an XCrawl API key stored at ~/.xcrawl/config.json, and expects curl/node to be available. Required bins and the local config file are proportionate to the stated purpose.
Instruction Scope
SKILL.md instructs the agent to read ~/.xcrawl/config.json for XCRAWL_API_KEY and to call https://run.xcrawl.com. That is appropriate for this integration. Minor note: the allowed-tools list includes Write/Edit in addition to Read/Grep — the documentation never requires writing arbitrary files, so Write/Edit appears unnecessary and could be tightened.
Install Mechanism
Instruction-only skill with no install spec and no downloads — lowest-risk install footprint.
Credentials
No global environment variables or unrelated credentials are requested. The API key is read from a local config file (~/.xcrawl/config.json), which the SKILL.md documents and the metadata reflects (apiKeySource: local_config). This is proportionate to the skill's function.
Persistence & Privilege
always is false and the skill has no install hooks or system-wide changes. Autonomous invocation is allowed (platform default) but not combined with any broad privileges or unrelated credential access.
Assessment
This skill appears coherent for using XCrawl: it reads an API key from ~/.xcrawl/config.json and uses curl/node to call run.xcrawl.com. Before installing, verify you trust the publisher and the homepage (https://www.xcrawl.com/), ensure the local config file is created with correct file permissions (restrict to your user), and confirm you are comfortable storing the API key in that file. Consider removing Write/Edit permissions if you want the skill strictly read-only. Also be aware of crawl costs (credits) and that some crawl options (e.g., webhook settings or submitting private URLs) can expose data to external endpoints — review request bodies and webhook targets before use. Finally, note defaults like skip_tls_verification=true appear in the API schema; if you need secure TLS verification, override that option when calling the API.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.2
Download zip
latestvk97ephf6c63t8nndjrps514hph82r9s6

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

Any bincurl, node

SKILL.md

XCrawl Crawl

Overview

This skill orchestrates full-site or scoped crawling with XCrawl Crawl APIs. Default behavior is raw passthrough: return upstream API response bodies as-is.

Required Local Config

Before using this skill, the user must create a local config file and write XCRAWL_API_KEY into it.

Path: ~/.xcrawl/config.json

{
  "XCRAWL_API_KEY": "<your_api_key>"
}

Read API key from local config file only. Do not require global environment variables.

Credits and Account Setup

Using XCrawl APIs consumes credits. If the user does not have an account or available credits, guide them to register at https://dash.xcrawl.com/. After registration, they can activate the free 1000 credits plan before running requests.

Tool Permission Policy

Request runtime permissions for curl and node only. Do not request Python, shell helper scripts, or other runtime permissions.

API Surface

  • Start crawl: POST /v1/crawl
  • Read result: GET /v1/crawl/{crawl_id}
  • Base URL: https://run.xcrawl.com
  • Required header: Authorization: Bearer <XCRAWL_API_KEY>

Usage Examples

cURL (create + result)

API_KEY="$(node -e "const fs=require('fs');const p=process.env.HOME+'/.xcrawl/config.json';const k=JSON.parse(fs.readFileSync(p,'utf8')).XCRAWL_API_KEY||'';process.stdout.write(k)")"

CREATE_RESP="$(curl -sS -X POST "https://run.xcrawl.com/v1/crawl" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d '{"url":"https://example.com","crawler":{"limit":100,"max_depth":2},"output":{"formats":["markdown","links"]}}')"

echo "$CREATE_RESP"

CRAWL_ID="$(node -e 'const s=process.argv[1];const j=JSON.parse(s);process.stdout.write(j.crawl_id||"")' "$CREATE_RESP")"

curl -sS -X GET "https://run.xcrawl.com/v1/crawl/${CRAWL_ID}" \
  -H "Authorization: Bearer ${API_KEY}"

Node

node -e '
const fs=require("fs");
const apiKey=JSON.parse(fs.readFileSync(process.env.HOME+"/.xcrawl/config.json","utf8")).XCRAWL_API_KEY;
const body={url:"https://example.com",crawler:{limit:300,max_depth:3,include:["/docs/.*"],exclude:["/blog/.*"]},request:{locale:"ja-JP"},output:{formats:["markdown","links","json"]}};
fetch("https://run.xcrawl.com/v1/crawl",{
  method:"POST",
  headers:{"Content-Type":"application/json",Authorization:`Bearer ${apiKey}`},
  body:JSON.stringify(body)
}).then(async r=>{console.log(await r.text());});
'

Request Parameters

Request endpoint and headers

  • Endpoint: POST https://run.xcrawl.com/v1/crawl
  • Headers:
  • Content-Type: application/json
  • Authorization: Bearer <api_key>

Request body: top-level fields

FieldTypeRequiredDefaultDescription
urlstringYes-Site entry URL
crawlerobjectNo-Crawler config
proxyobjectNo-Proxy config
requestobjectNo-Request config
js_renderobjectNo-JS rendering config
outputobjectNo-Output config
webhookobjectNo-Async callback config

crawler

FieldTypeRequiredDefaultDescription
limitintegerNo100Max pages
includestring[]No-Include only matching URLs (regex supported)
excludestring[]No-Exclude matching URLs (regex supported)
max_depthintegerNo3Max depth from entry URL
include_entire_domainbooleanNofalseCrawl full site instead of only subpaths
include_subdomainsbooleanNofalseInclude subdomains
include_external_linksbooleanNofalseInclude external links
sitemapsbooleanNotrueUse site sitemap

proxy

FieldTypeRequiredDefaultDescription
locationstringNoUSISO-3166-1 alpha-2 country code, e.g. US / JP / SG
sticky_sessionstringNoAuto-generatedSticky session ID; same ID attempts to reuse exit

request

FieldTypeRequiredDefaultDescription
localestringNoen-US,en;q=0.9Affects Accept-Language
devicestringNodesktopdesktop / mobile; affects UA and viewport
cookiesobject mapNo-Cookie key/value pairs
headersobject mapNo-Header key/value pairs
only_main_contentbooleanNotrueReturn main content only
block_adsbooleanNotrueAttempt to block ad resources
skip_tls_verificationbooleanNotrueSkip TLS verification

js_render

FieldTypeRequiredDefaultDescription
enabledbooleanNotrueEnable browser rendering
wait_untilstringNoloadload / domcontentloaded / networkidle
viewport.widthintegerNo-Viewport width (desktop 1920, mobile 402)
viewport.heightintegerNo-Viewport height (desktop 1080, mobile 874)

output

FieldTypeRequiredDefaultDescription
formatsstring[]No["markdown"]Output formats
screenshotstringNoviewportfull_page / viewport (only if formats includes screenshot)
json.promptstringNo-Extraction prompt
json.json_schemaobjectNo-JSON Schema

output.formats enum:

  • html
  • raw_html
  • markdown
  • links
  • summary
  • screenshot
  • json

webhook

FieldTypeRequiredDefaultDescription
urlstringNo-Callback URL
headersobject mapNo-Custom callback headers
eventsstring[]No["started","completed","failed"]Events: started / completed / failed

Response Parameters

Create response (POST /v1/crawl)

FieldTypeDescription
crawl_idstringTask ID
endpointstringAlways crawl
versionstringVersion
statusstringAlways pending

Result response (GET /v1/crawl/{crawl_id})

FieldTypeDescription
crawl_idstringTask ID
endpointstringAlways crawl
versionstringVersion
statusstringpending / crawling / completed / failed
urlstringEntry URL
dataobject[]Per-page result array
started_atstringStart time (ISO 8601)
ended_atstringEnd time (ISO 8601)
total_credits_usedintegerTotal credits used

data[] fields follow output.formats:

  • html, raw_html, markdown, links, summary, screenshot, json
  • metadata (page metadata)
  • traffic_bytes
  • credits_used
  • credits_detail

Workflow

  1. Confirm business objective and crawl boundary.
  • What content is required, what content must be excluded, and what is the completion signal.
  1. Draft a bounded crawl request.
  • Prefer explicit limits and path constraints.
  1. Start crawl and capture task metadata.
  • Record crawl_id, initial status, and request payload.
  1. Poll GET /v1/crawl/{crawl_id} until terminal state.
  • Track pending, crawling, completed, or failed.
  1. Return raw create/result responses.
  • Do not synthesize derived summaries unless explicitly requested.

Output Contract

Return:

  • Endpoint flow (POST /v1/crawl + GET /v1/crawl/{crawl_id})
  • request_payload used for the create request
  • Raw response body from create call
  • Raw response body from result call
  • Error details when request fails

Do not generate summaries unless the user explicitly requests a summary.

Guardrails

  • Never run an unbounded crawl without explicit constraints.
  • Do not present speculative page counts as final coverage.
  • Do not hardcode provider-specific tool schemas in core logic.
  • Highlight policy, legal, or website-usage risks when relevant.

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…