Web Scraping

Extract structured information from websites using web_fetch for simple pages and browser automation for dynamic sites, login-gated flows, pagination, infini...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
1 · 2.9k · 11 current installs · 12 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name and description match the SKILL.md. The instructions only reference platform scraping helpers (web_fetch, browser, web_search) and do not require unrelated binaries, env vars, or config paths.
Instruction Scope
The SKILL.md stays focused on scraping tasks (fetch vs browser, pagination, snapshots, deduplication, saving to workspace). It does explicitly mention login-gated flows and interaction, but it does not provide guidance on how to obtain or handle user credentials or on legal/robots policy considerations — this is a functional gap rather than an incoherence.
Install Mechanism
No install spec and no code files—lowest-risk pattern for this catalog. Nothing would be written to disk by an installer.
Credentials
The skill declares no required env vars or credentials, which is proportionate. However, because it supports login-protected sites, it may prompt for user-provided credentials at runtime; the SKILL.md does not specify secure handling or storage of such credentials.
Persistence & Privilege
always is false and the skill does not request persistent system privileges or modify other skills. The skill may save scraped output to the workspace as described, which is reasonable for its purpose.
Assessment
This is an instruction-only web-scraping helper that appears coherent and low-risk from a package/install perspective. Before using it: (1) be prepared to supply login credentials interactively for gated sites — do not store secrets in plain workspace files unless you control their security; (2) confirm that scraping the target site is permitted under its terms of service and robots.txt and avoid heavy request loops (the SKILL.md already recommends throttling); (3) review how scraped results are saved in your workspace and who/what can access those files; (4) if you prefer, disable autonomous invocation for this skill or require explicit user confirmation before it performs logins or large-scale scraping jobs. If you need stronger guarantees about credential handling or legal compliance, ask the developer for explicit instructions in the SKILL.md about secure credential prompts and policy checks.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk9791y8wa9bahfefdf653ytq5d835w4w

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

Web Scraping

Extract data with the lightest reliable method first.

Choose the approach

  1. Use web_fetch for simple public pages when the needed content is already in HTML.
  2. Use browser when the site is dynamic, needs clicking, infinite scroll, filters, tabs, or login/session state.
  3. Use web_search only to discover candidate pages when the target URL is unknown.

Default workflow

  1. Identify the target site and exact fields to collect.
  2. Test one page first.
  3. Decide the extraction method:
    • web_fetch for readable article/listing text
    • browser snapshot for dynamic DOM inspection
  4. Normalize the output into a stable schema.
  5. If scraping multiple pages, avoid tight loops and serialize requests.
  6. Deduplicate by URL or stable item id.
  7. Save results in the workspace when the task is larger than a quick one-off.

Browser scraping pattern

  1. Open the page.
  2. Take a snapshot.
  3. Interact only as needed: search, click filters, pagination, expand sections.
  4. Re-snapshot after each meaningful state change.
  5. Extract only the fields the user asked for.
  6. Close tabs when finished.

Output guidance

Prefer one of these formats:

  • concise bullet summary
  • JSON array of objects
  • CSV/TSV when the user wants exportable rows

Use explicit keys, for example:

[
  {
    "title": "...",
    "url": "...",
    "source": "...",
    "date": "...",
    "summary": "..."
  }
]

Reliability rules

  • Do not invent missing fields.
  • If a site blocks access, say so and switch sources when appropriate.
  • For news/results pages, capture source + title + link at minimum.
  • For large jobs, checkpoint partial results to a workspace file.
  • Prefer fewer larger writes over many tiny writes.

Cleanup

  • Close browser tabs opened for scraping.
  • If you create state/output files, store them under the workspace and name them clearly.

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…