Data Scraper

AdvisoryAudited by Static analysis on Apr 30, 2026.

Overview

No suspicious patterns detected.

Findings (0)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

The skill can send requests from the user's machine or workspace to arbitrary web addresses provided to it.

Why it was flagged

The script fetches whatever URL is supplied. This is central to a web-scraping skill, but users should ensure the URL is intended, authorized, and safe to request from their environment.

Skill content
if ! curl -s -L -A "Mozilla/5.0" "$URL" > "$TMP_FILE"; then
Recommendation

Use it only for public or authorized pages, avoid scraping internal/private endpoints, and set explicit rate limits for repeated scraping.

What this means

If a user supplies a token or cookie, the scraper may access pages as that user on the target site.

Why it was flagged

The documentation shows optional use of bearer tokens and session cookies for authenticated scraping. This is purpose-aligned, but those values can grant account access.

Skill content
data-scraper fetch URL --header "Authorization: Bearer TOKEN"

data-scraper fetch URL --cookie "session=abc123"
Recommendation

Only provide scoped, temporary credentials for sites you are allowed to scrape, and avoid using account cookies for sensitive services unless necessary.

What this means

Scraped data may remain in local memory files and later influence reports or analysis.

Why it was flagged

The guide describes saving scraped outputs into workspace memory. This is expected for monitoring and reporting, but scraped content can persist and be reused later.

Skill content
memory/scraped/
  ├── kmong-ai-chatbot-2026-02-14.json
  ├── toss-tech-posts-2026-02-14.json
  └── product-prices-2026-02-14.json
Recommendation

Do not scrape sensitive pages unless you intend to store the results, and periodically review or delete stored scrape data.

What this means

Scrape activity metadata may be picked up by reporting workflows.

Why it was flagged

The generated event is explicitly marked for another workflow or consumer named daily-report. The event contains the scraped URL and format, not the page content, but it is still a cross-workflow data flow.

Skill content
"consumers": ["daily-report"]
Recommendation

Check whether daily-report or similar consumers are enabled before scraping URLs that reveal private interests or internal resources.

What this means

Some documented commands or safety features may not actually be available from the included files.

Why it was flagged

The documentation refers to a data-scraper CLI with commands such as fetch, extract, batch, and watch, but the supplied artifacts include no install spec and only a minimal run.sh script. This is a packaging/capability mismatch rather than evidence of malicious behavior.

Skill content
data-scraper fetch "https://example.com/article"
Recommendation

Verify the installed command and prefer the reviewed run.sh behavior unless additional trusted implementation files are provided.

What this means

A user may overestimate the scraper's built-in politeness controls.

Why it was flagged

The documentation claims rate-limiting and robots.txt behavior, but the included run.sh performs a simple single curl request and does not implement those controls. Users should not assume those safeguards exist unless another trusted implementation is present.

Skill content
- Default: 1 request per second per domain
- Respects `robots.txt` when `--polite` flag is set
Recommendation

Manually enforce robots.txt checks, delays, and retry limits when doing repeated or batch scraping.