XPR Web Scraping

Tools for fetching and extracting cleaned text, metadata, and links from single or multiple web pages with format options and link filtering.

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 0 · 1.6k · 11 current installs · 11 all-time installs

by@paulgnz

MIT-0

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Benign

medium confidence

✓

Purpose & Capability

Name/description (fetching, extracting text/links/metadata) match the actual tools and code: scrape_url, extract_links, scrape_multiple. No unrelated env vars, binaries, or services are requested.

✓

Instruction Scope

SKILL.md describes limited scraping actions (single page, link extraction, multi-page up to 10). Instructions recommend rate-limiting and content-size limits and do not instruct access to unrelated files, credentials, or external endpoints beyond the target pages.

✓

Install Mechanism

No install spec; skill is instruction-plus-code and relies on built-in Node fetch. No downloads, package registry installs, or archive extraction are present in the provided metadata.

✓

Credentials

Skill requires no environment variables, credentials, or config paths. The code uses only network fetch and in-memory parsing; requested access is proportional to web-scraping functionality.

✓

Persistence & Privilege

always is false and disable-model-invocation is false (normal). The skill does not request persistent system-wide privileges or modify other skills. Autonomous invocation is allowed by platform default but not combined with other red flags.

Assessment

This skill appears to be a coherent, self-contained web scraper that doesn't request secrets or install external code. Before installing: (1) review the full src/index.ts (the provided snippet was truncated) to confirm there are no hidden network callbacks or logging endpoints; (2) ensure use complies with target sites' robots.txt, terms of service, and legal/privacy rules; (3) enforce rate limits and avoid scraping protected or paywalled content; (4) if you run in a sensitive environment, sandbox the skill (or review for any unexpected outbound endpoints) before enabling autonomous invocation.

Like a lobster shell, security has layers — review code before you run it.

Current versionv0.2.11

Download zip

extractionvk9747d799hshn53qcd0k30gfen813t5clatestvk9747d799hshn53qcd0k30gfen813t5cweb-scrapingvk9747d799hshn53qcd0k30gfen813t5cxprvk9747d799hshn53qcd0k30gfen813t5c

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

SKILL.md

Web Scraping

You have web scraping tools for fetching and extracting data from web pages:

Single page:

scrape_url — fetch a URL and get cleaned text content + metadata (title, description, link count)
- Use format="text" (default) for most tasks — strips all HTML
- Use format="markdown" to preserve headings, links, lists, bold/italic
- Use format="html" only when you need raw HTML

Link discovery:

extract_links — fetch a page and extract all links with text and type (internal/external)
- Use the pattern parameter to filter by regex (e.g. "\\.pdf$" for PDF links)
- Links are deduplicated and resolved to absolute URLs

Multi-page research:

scrape_multiple — fetch up to 10 URLs in parallel for comparison/research
- One failure doesn't block others (uses Promise.allSettled)

Best practices:

Prefer "text" format for content extraction, "markdown" for preserving structure
Don't scrape the same domain more than 5 times per minute
Combine with store_deliverable to save scraped content as job evidence
For very large pages, the content is limited to 5MB

Files

3 total

Select a file

Select a file to preview.

Comments

Loading comments…