Data Spider
Scrape any webpage and extract structured data as JSON, table, or list. Supports schema-guided extraction.
MIT-0 · Free to use, modify, and redistribute. No attribution required.
⭐ 1 · 300 · 4 current installs · 4 all-time installs
MIT-0
Security Scan
OpenClaw
Benign
medium confidencePurpose & Capability
Name/description (web scraping, schema extraction) match the runtime instructions: the SKILL.md explicitly calls aiprox.dev orchestrate endpoints to perform scraping. Requiring a single spend token for a hosted service is expected.
Instruction Scope
Instructions are focused on calling the external API and do not ask the agent to read local files or extra environment variables. However, the runtime flow sends the full target webpage content (the thing being scraped) to a third-party orchestration service (aiprox.dev), which can disclose any sensitive content present on the scraped pages.
Install Mechanism
No install spec and no code files—this is instruction-only, so nothing will be written to disk or fetched at install time by the skill itself.
Credentials
Only one required environment variable (AIPROX_SPEND_TOKEN) is declared and documented in the SKILL.md as the payment/auth token for the external API. That is proportionate, but the token is sensitive (used for billing/auth) and grants the service the ability to accept requests on your behalf.
Persistence & Privilege
always is false and the skill does not request persistent system privileges or any config paths. It does not attempt to modify other skills or agent-wide settings.
Assessment
This skill is coherent but relies on an external service (aiprox.dev) to fetch and analyze webpage content. Before installing or using it: 1) Only send URLs that do not contain sensitive data (logins, internal docs, PII). 2) Review aiprox.dev’s privacy, storage, and billing policies—the SKILL.md claims transient processing but you must trust the vendor. 3) Treat AIPROX_SPEND_TOKEN as a secret: rotate it if leaked and limit its scope if possible. 4) If you need scraping of sensitive or internal sites, prefer a self-hosted scraper or run tools locally instead of routing data to a third party. 5) Test with non-sensitive pages first and monitor billing usage.Like a lobster shell, security has layers — review code before you run it.
Current versionv1.1.0
Download ziplatest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
Runtime requirements
🕷️ Clawdis
EnvAIPROX_SPEND_TOKEN
SKILL.md
Data Spider
Scrape and extract structured data from any webpage. Supports schema-guided extraction to match a specific data shape, or auto-detection of structure. Returns data as JSON object, table (columns + rows), or flat list depending on your chosen format.
When to Use
- Extracting product information or pricing from pages
- Gathering statistics and figures from articles
- Building datasets from web sources
- Schema-guided extraction to match your data model
- Research and competitive analysis
Usage Flow
- Provide a webpage
url - Optionally provide a
schemaobject — data will be extracted to match that exact shape - Optionally set
format:json(default),table, orlist - AIProx routes to the data-spider agent
- Returns structured data in the requested format, plus summary and source URL
Security Manifest
| Permission | Scope | Reason |
|---|---|---|
| Network | aiprox.dev | API calls to orchestration endpoint |
| Env Read | AIPROX_SPEND_TOKEN | Authentication for paid API |
Make Request — JSON with Schema
curl -X POST https://aiprox.dev/api/orchestrate \
-H "Content-Type: application/json" \
-H "X-Spend-Token: $AIPROX_SPEND_TOKEN" \
-d '{
"url": "https://example.com/pricing",
"schema": {"free_tier": null, "pro_price": null, "enterprise": null},
"format": "json"
}'
Response — JSON
{
"data": {"free_tier": "$0/month, 1000 API calls", "pro_price": "$29/month", "enterprise": "custom pricing"},
"summary": "SaaS pricing page with three tiers.",
"source": "https://example.com/pricing",
"format": "json"
}
Make Request — Table
curl -X POST https://aiprox.dev/api/orchestrate \
-H "Content-Type: application/json" \
-H "X-Spend-Token: $AIPROX_SPEND_TOKEN" \
-d '{
"task": "extract pricing tiers as a table",
"url": "https://example.com/pricing",
"format": "table"
}'
Response — Table
{
"columns": ["Plan", "Price", "API Calls"],
"rows": [
["Free", "$0/month", "1,000"],
["Pro", "$29/month", "50,000"],
["Enterprise", "Custom", "Unlimited"]
],
"summary": "Three-tier SaaS pricing.",
"source": "https://example.com/pricing",
"format": "table"
}
Response — List
{
"items": ["$0/month — Free tier, 1000 API calls", "$29/month — Pro, 50,000 calls", "Enterprise — custom pricing"],
"summary": "SaaS pricing tiers extracted as flat list.",
"source": "https://example.com/pricing",
"format": "list"
}
Trust Statement
Data Spider fetches and analyzes webpage contents via URL. Content is processed transiently and not stored. Analysis is performed by Claude via LightningProx. Respects robots.txt and rate limits. Your spend token is used for payment only.
Files
1 totalSelect a file
Select a file to preview.
Comments
Loading comments…
