Data Scraper

AdvisoryAudited by Static analysis on Apr 30, 2026.

Overview

No suspicious patterns detected.

Findings (0)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Note

ASI02: Tool Misuse and Exploitation

What this means

The skill can send requests from the user's machine or workspace to arbitrary web addresses provided to it.

Why it was flagged

The script fetches whatever URL is supplied. This is central to a web-scraping skill, but users should ensure the URL is intended, authorized, and safe to request from their environment.

Skill content

if ! curl -s -L -A "Mozilla/5.0" "$URL" > "$TMP_FILE"; then

Recommendation

Use it only for public or authorized pages, avoid scraping internal/private endpoints, and set explicit rate limits for repeated scraping.

Note

ASI03: Identity and Privilege Abuse

What this means

If a user supplies a token or cookie, the scraper may access pages as that user on the target site.

Why it was flagged

The documentation shows optional use of bearer tokens and session cookies for authenticated scraping. This is purpose-aligned, but those values can grant account access.

Skill content

data-scraper fetch URL --header "Authorization: Bearer TOKEN"

data-scraper fetch URL --cookie "session=abc123"

Recommendation

Only provide scoped, temporary credentials for sites you are allowed to scrape, and avoid using account cookies for sensitive services unless necessary.

Note

ASI06: Memory and Context Poisoning

What this means

Scraped data may remain in local memory files and later influence reports or analysis.

Why it was flagged

The guide describes saving scraped outputs into workspace memory. This is expected for monitoring and reporting, but scraped content can persist and be reused later.

Skill content

memory/scraped/
  ├── kmong-ai-chatbot-2026-02-14.json
  ├── toss-tech-posts-2026-02-14.json
  └── product-prices-2026-02-14.json

Recommendation

Do not scrape sensitive pages unless you intend to store the results, and periodically review or delete stored scrape data.

Note

ASI07: Insecure Inter-Agent Communication

What this means

Scrape activity metadata may be picked up by reporting workflows.

Why it was flagged

The generated event is explicitly marked for another workflow or consumer named daily-report. The event contains the scraped URL and format, not the page content, but it is still a cross-workflow data flow.

Skill content

"consumers": ["daily-report"]

Recommendation

Check whether daily-report or similar consumers are enabled before scraping URLs that reveal private interests or internal resources.

Note

ASI04: Agentic Supply Chain Vulnerabilities

What this means

Some documented commands or safety features may not actually be available from the included files.

Why it was flagged

The documentation refers to a data-scraper CLI with commands such as fetch, extract, batch, and watch, but the supplied artifacts include no install spec and only a minimal run.sh script. This is a packaging/capability mismatch rather than evidence of malicious behavior.

Skill content

data-scraper fetch "https://example.com/article"

Recommendation

Verify the installed command and prefer the reviewed run.sh behavior unless additional trusted implementation files are provided.

Note

ASI09: Human-Agent Trust Exploitation

What this means

A user may overestimate the scraper's built-in politeness controls.

Why it was flagged

The documentation claims rate-limiting and robots.txt behavior, but the included run.sh performs a simple single curl request and does not implement those controls. Users should not assume those safeguards exist unless another trusted implementation is present.

Skill content

- Default: 1 request per second per domain
- Respects `robots.txt` when `--polite` flag is set

Recommendation

Manually enforce robots.txt checks, delays, and retry limits when doing repeated or batch scraping.