Data Spider

Scrape any webpage and extract structured data as JSON, table, or list. Supports schema-guided extraction.

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 1 · 300 · 4 current installs · 4 all-time installs

by@unixlamadev-spec

MIT-0

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

medium confidence

✓

Purpose & Capability

Name/description (web scraping, schema extraction) match the runtime instructions: the SKILL.md explicitly calls aiprox.dev orchestrate endpoints to perform scraping. Requiring a single spend token for a hosted service is expected.

ℹ

Instruction Scope

Instructions are focused on calling the external API and do not ask the agent to read local files or extra environment variables. However, the runtime flow sends the full target webpage content (the thing being scraped) to a third-party orchestration service (aiprox.dev), which can disclose any sensitive content present on the scraped pages.

✓

Install Mechanism

No install spec and no code files—this is instruction-only, so nothing will be written to disk or fetched at install time by the skill itself.

✓

Credentials

Only one required environment variable (AIPROX_SPEND_TOKEN) is declared and documented in the SKILL.md as the payment/auth token for the external API. That is proportionate, but the token is sensitive (used for billing/auth) and grants the service the ability to accept requests on your behalf.

✓

Persistence & Privilege

always is false and the skill does not request persistent system privileges or any config paths. It does not attempt to modify other skills or agent-wide settings.

Assessment

This skill is coherent but relies on an external service (aiprox.dev) to fetch and analyze webpage content. Before installing or using it: 1) Only send URLs that do not contain sensitive data (logins, internal docs, PII). 2) Review aiprox.dev’s privacy, storage, and billing policies—the SKILL.md claims transient processing but you must trust the vendor. 3) Treat AIPROX_SPEND_TOKEN as a secret: rotate it if leaked and limit its scope if possible. 4) If you need scraping of sensitive or internal sites, prefer a self-hosted scraper or run tools locally instead of routing data to a third party. 5) Test with non-sensitive pages first and monitor billing usage.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.1.0

Download zip

latestvk971t2tcpd66m1vv8rxvjp931d82x1p8

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

Runtime requirements

🕷️ Clawdis

EnvAIPROX_SPEND_TOKEN

SKILL.md

Data Spider

Scrape and extract structured data from any webpage. Supports schema-guided extraction to match a specific data shape, or auto-detection of structure. Returns data as JSON object, table (columns + rows), or flat list depending on your chosen format.

When to Use

Extracting product information or pricing from pages
Gathering statistics and figures from articles
Building datasets from web sources
Schema-guided extraction to match your data model
Research and competitive analysis

Usage Flow

Provide a webpage url
Optionally provide a schema object — data will be extracted to match that exact shape
Optionally set format: json (default), table, or list
AIProx routes to the data-spider agent
Returns structured data in the requested format, plus summary and source URL

Security Manifest

Permission	Scope	Reason
Network	aiprox.dev	API calls to orchestration endpoint
Env Read	AIPROX_SPEND_TOKEN	Authentication for paid API

Make Request — JSON with Schema

curl -X POST https://aiprox.dev/api/orchestrate \
  -H "Content-Type: application/json" \
  -H "X-Spend-Token: $AIPROX_SPEND_TOKEN" \
  -d '{
    "url": "https://example.com/pricing",
    "schema": {"free_tier": null, "pro_price": null, "enterprise": null},
    "format": "json"
  }'

Response — JSON

{
  "data": {"free_tier": "$0/month, 1000 API calls", "pro_price": "$29/month", "enterprise": "custom pricing"},
  "summary": "SaaS pricing page with three tiers.",
  "source": "https://example.com/pricing",
  "format": "json"
}

Make Request — Table

curl -X POST https://aiprox.dev/api/orchestrate \
  -H "Content-Type: application/json" \
  -H "X-Spend-Token: $AIPROX_SPEND_TOKEN" \
  -d '{
    "task": "extract pricing tiers as a table",
    "url": "https://example.com/pricing",
    "format": "table"
  }'

Response — Table

{
  "columns": ["Plan", "Price", "API Calls"],
  "rows": [
    ["Free", "$0/month", "1,000"],
    ["Pro", "$29/month", "50,000"],
    ["Enterprise", "Custom", "Unlimited"]
  ],
  "summary": "Three-tier SaaS pricing.",
  "source": "https://example.com/pricing",
  "format": "table"
}

Response — List

{
  "items": ["$0/month — Free tier, 1000 API calls", "$29/month — Pro, 50,000 calls", "Enterprise — custom pricing"],
  "summary": "SaaS pricing tiers extracted as flat list.",
  "source": "https://example.com/pricing",
  "format": "list"
}

Trust Statement

Data Spider fetches and analyzes webpage contents via URL. Content is processed transiently and not stored. Analysis is performed by Claude via LightningProx. Respects robots.txt and rate limits. Your spend token is used for payment only.

Files

1 total

Select a file

Select a file to preview.

Comments

Loading comments…