Data Scraper
AdvisoryAudited by Static analysis on Apr 30, 2026.
Overview
No suspicious patterns detected.
Findings (0)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
The skill can send requests from the user's machine or workspace to arbitrary web addresses provided to it.
The script fetches whatever URL is supplied. This is central to a web-scraping skill, but users should ensure the URL is intended, authorized, and safe to request from their environment.
if ! curl -s -L -A "Mozilla/5.0" "$URL" > "$TMP_FILE"; then
Use it only for public or authorized pages, avoid scraping internal/private endpoints, and set explicit rate limits for repeated scraping.
If a user supplies a token or cookie, the scraper may access pages as that user on the target site.
The documentation shows optional use of bearer tokens and session cookies for authenticated scraping. This is purpose-aligned, but those values can grant account access.
data-scraper fetch URL --header "Authorization: Bearer TOKEN" data-scraper fetch URL --cookie "session=abc123"
Only provide scoped, temporary credentials for sites you are allowed to scrape, and avoid using account cookies for sensitive services unless necessary.
Scraped data may remain in local memory files and later influence reports or analysis.
The guide describes saving scraped outputs into workspace memory. This is expected for monitoring and reporting, but scraped content can persist and be reused later.
memory/scraped/ ├── kmong-ai-chatbot-2026-02-14.json ├── toss-tech-posts-2026-02-14.json └── product-prices-2026-02-14.json
Do not scrape sensitive pages unless you intend to store the results, and periodically review or delete stored scrape data.
Scrape activity metadata may be picked up by reporting workflows.
The generated event is explicitly marked for another workflow or consumer named daily-report. The event contains the scraped URL and format, not the page content, but it is still a cross-workflow data flow.
"consumers": ["daily-report"]
Check whether daily-report or similar consumers are enabled before scraping URLs that reveal private interests or internal resources.
Some documented commands or safety features may not actually be available from the included files.
The documentation refers to a data-scraper CLI with commands such as fetch, extract, batch, and watch, but the supplied artifacts include no install spec and only a minimal run.sh script. This is a packaging/capability mismatch rather than evidence of malicious behavior.
data-scraper fetch "https://example.com/article"
Verify the installed command and prefer the reviewed run.sh behavior unless additional trusted implementation files are provided.
A user may overestimate the scraper's built-in politeness controls.
The documentation claims rate-limiting and robots.txt behavior, but the included run.sh performs a simple single curl request and does not implement those controls. Users should not assume those safeguards exist unless another trusted implementation is present.
- Default: 1 request per second per domain - Respects `robots.txt` when `--polite` flag is set
Manually enforce robots.txt checks, delays, and retry limits when doing repeated or batch scraping.
