Data Scraper

v1.0.0

Extract data from websites and APIs for analysis. Use when user needs to collect product prices from e-commerce sites, gather news articles, extract structur...

⭐ 0· 32·0 current·0 all-time

byBIN@dinghaibin

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for dinghaibin/scraper-pro.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Data Scraper" (dinghaibin/scraper-pro) from ClawHub.
Skill page: https://clawhub.ai/dinghaibin/scraper-pro
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install scraper-pro

ClawHub CLI

Package manager switcher

npx clawhub@latest install scraper-pro

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

high confidence

Purpose & Capability

The name/description promise features (CSS/XPath selectors, pagination types including click, authentication/login support) that are not implemented in the included script. The SKILL.md shows a --login option and complex YAML pagination examples, but scripts/scrape.py does not accept a --login argument, does not implement click-based pagination, and does not implement XPath or robust selector parsing. This mismatch is incoherent and may mislead users about capabilities.

Instruction Scope

Runtime instructions tell the agent to run the bundled Python script with user-supplied URLs and output paths. The script performs network fetches and writes files but does not read other system files or environment variables. However, the code disables TLS certificate validation (ssl.CERT_NONE and check_hostname=False) when fetching pages, which weakens transport security and can enable MITM attacks; the SKILL.md and 'Best Practices' recommend checking robots.txt and respecting rate limits but do not disclose the TLS bypass. The documentation also references command-line options (--login) that are absent from the code.

✓

Install Mechanism

No install spec; instruction-only with a small included Python script. Nothing is downloaded or written by an installer, which limits the installation attack surface.

✓

Credentials

The skill requests no environment variables, credentials, or config paths. That is proportionate for a generic scraper. There is no evidence the code attempts to access other secrets or unrelated system configuration.

✓

Persistence & Privilege

The skill is not always-enabled and does not request persistent system-wide privileges or modify other skills. It only runs when invoked and writes output files specified by the user.

What to consider before installing

This skill contains an executable Python scraper but the documentation overstates its capabilities (login, click-style pagination, XPath) which the script does not implement — treat those docs as inaccurate. Before using: (1) inspect or run the script in a safe sandbox; (2) do not pass credentials or sensitive file paths to the tool (it will write files to paths you specify); (3) fix or remove the TLS verification bypass in fetch_page (re-enable certificate checks) unless you understand and accept the risk; (4) test scraping on non-sensitive, permitted sites and confirm legal/robots.txt compliance; (5) if you need authentication/pagination/XPath support, either extend the script yourself or obtain a tool that explicitly implements and documents those features.

Like a lobster shell, security has layers — review code before you run it.

latestvk97byshetkc1e3ssz321byxwdx85nsx0

32downloads

0stars

1versions

Updated 11h ago

v1.0.0

MIT-0

Data Scraper

Extract structured data from websites and APIs.

Quick Start

# Basic page scrape
python scripts/scrape.py https://example.com --output data.json

Core Features

CSS/XPath selectors: Target specific elements
Multiple output formats: JSON, CSV, Markdown
Pagination support: Scrape multiple pages
Rate limiting: Respect server limits
Authentication: Handle login/sessions

Usage

python scripts/scrape.py [OPTIONS]

Options:
  --url TEXT          URL to scrape (required)
  --selector TEXT     CSS selector for data extraction
  --output PATH       Output file path
  --format FORMAT     Output format: json, csv, markdown
  --limit NUM         Maximum items to scrape
  --wait SECS         Wait between requests
  --login URL         Login URL for authenticated scraping

Examples

Product Price Collection

python scripts/scrape.py \
  --url "https://example.com/products" \
  --selector ".product" \
  --output prices.json \
  --format json

News Article Aggregation

python scripts/scrape.py \
  --url "https://news.example.com/latest" \
  --selector "article" \
  --output news.md \
  --format markdown

Configuration File

Create scrape.yaml for complex scraping:

url: https://example.com/products
selectors:
  items: ".product-card"
  title: ".product-title"
  price: ".price::text"
  image: "img::attr(src)"
  link: "a::attr(href)"

pagination:
  type: click
  button: ".next-page"
  max_pages: 10

output:
  format: json
  file: products.json

Best Practices

Check robots.txt before scraping
Add delays between requests
Cache responses for development
Handle errors gracefully
Store raw HTML for debugging

Legal Note

Ensure you have permission to scrape target websites. Check Terms of Service and robots.txt.

Comments

Loading comments...