Web Scraping
Extract structured information from websites using web_fetch for simple pages and browser automation for dynamic sites, login-gated flows, pagination, infini...
MIT-0 · Free to use, modify, and redistribute. No attribution required.
⭐ 1 · 2.9k · 11 current installs · 12 all-time installs
MIT-0
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name and description match the SKILL.md. The instructions only reference platform scraping helpers (web_fetch, browser, web_search) and do not require unrelated binaries, env vars, or config paths.
Instruction Scope
The SKILL.md stays focused on scraping tasks (fetch vs browser, pagination, snapshots, deduplication, saving to workspace). It does explicitly mention login-gated flows and interaction, but it does not provide guidance on how to obtain or handle user credentials or on legal/robots policy considerations — this is a functional gap rather than an incoherence.
Install Mechanism
No install spec and no code files—lowest-risk pattern for this catalog. Nothing would be written to disk by an installer.
Credentials
The skill declares no required env vars or credentials, which is proportionate. However, because it supports login-protected sites, it may prompt for user-provided credentials at runtime; the SKILL.md does not specify secure handling or storage of such credentials.
Persistence & Privilege
always is false and the skill does not request persistent system privileges or modify other skills. The skill may save scraped output to the workspace as described, which is reasonable for its purpose.
Assessment
This is an instruction-only web-scraping helper that appears coherent and low-risk from a package/install perspective. Before using it: (1) be prepared to supply login credentials interactively for gated sites — do not store secrets in plain workspace files unless you control their security; (2) confirm that scraping the target site is permitted under its terms of service and robots.txt and avoid heavy request loops (the SKILL.md already recommends throttling); (3) review how scraped results are saved in your workspace and who/what can access those files; (4) if you prefer, disable autonomous invocation for this skill or require explicit user confirmation before it performs logins or large-scale scraping jobs. If you need stronger guarantees about credential handling or legal compliance, ask the developer for explicit instructions in the SKILL.md about secure credential prompts and policy checks.Like a lobster shell, security has layers — review code before you run it.
Current versionv1.0.0
Download ziplatest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
SKILL.md
Web Scraping
Extract data with the lightest reliable method first.
Choose the approach
- Use
web_fetchfor simple public pages when the needed content is already in HTML. - Use
browserwhen the site is dynamic, needs clicking, infinite scroll, filters, tabs, or login/session state. - Use
web_searchonly to discover candidate pages when the target URL is unknown.
Default workflow
- Identify the target site and exact fields to collect.
- Test one page first.
- Decide the extraction method:
web_fetchfor readable article/listing textbrowser snapshotfor dynamic DOM inspection
- Normalize the output into a stable schema.
- If scraping multiple pages, avoid tight loops and serialize requests.
- Deduplicate by URL or stable item id.
- Save results in the workspace when the task is larger than a quick one-off.
Browser scraping pattern
- Open the page.
- Take a snapshot.
- Interact only as needed: search, click filters, pagination, expand sections.
- Re-snapshot after each meaningful state change.
- Extract only the fields the user asked for.
- Close tabs when finished.
Output guidance
Prefer one of these formats:
- concise bullet summary
- JSON array of objects
- CSV/TSV when the user wants exportable rows
Use explicit keys, for example:
[
{
"title": "...",
"url": "...",
"source": "...",
"date": "...",
"summary": "..."
}
]
Reliability rules
- Do not invent missing fields.
- If a site blocks access, say so and switch sources when appropriate.
- For news/results pages, capture source + title + link at minimum.
- For large jobs, checkpoint partial results to a workspace file.
- Prefer fewer larger writes over many tiny writes.
Cleanup
- Close browser tabs opened for scraping.
- If you create state/output files, store them under the workspace and name them clearly.
Files
1 totalSelect a file
Select a file to preview.
Comments
Loading comments…
