Smart Web Scraper

PassAudited by ClawScan on May 1, 2026.

Overview

The skill appears to be a straightforward web scraper, with disclosed web-crawling and runtime dependency behavior that users should use responsibly.

This skill looks safe for normal, user-directed scraping of public webpages. Before installing, be aware that it may fetch third-party Python packages at runtime and can crawl websites automatically; use conservative page limits, respect robots.txt and site terms, and treat all scraped page content as untrusted data.

Findings (3)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Low

#ASI02: Tool Misuse and Exploitation

What this means

If used carelessly, it could send automated traffic to websites or collect data in ways that violate site rules.

Why it was flagged

The skill can make repeated automated web requests and includes a user-controlled robots.txt override. This is disclosed, bounded by a page count, and central to the scraper purpose.

Skill content

`crawl` | `<url> --pages N ...` ... `Default: 1 request per second` ... `Respects robots.txt by default (override with --ignore-robots)`

Recommendation

Use it only on sites you are allowed to scrape, keep the default robots.txt and delay behavior unless you understand the impact, and set conservative page limits.

Info

#ASI04: Agentic Supply Chain Vulnerabilities

What this means

The command may install current versions of third-party packages from the configured package index.

Why it was flagged

The documented run method resolves Python dependencies at runtime without pinned versions. This is a normal setup pattern for this purpose, but users should notice the dependency source.

Skill content

`uv run --with beautifulsoup4 --with lxml python scripts/scraper.py ...` and `Requires beautifulsoup4 and lxml (auto-installed by uv run --with)`

Recommendation

Run it in an isolated environment and pin or review dependency versions if you need reproducible or higher-assurance installs.

Low

#ASI06: Memory and Context Poisoning

What this means

A scraped page could contain text that tries to influence the agent if the agent treats extracted content as instructions instead of data.

Why it was flagged

The skill intentionally brings arbitrary webpage text into the agent's working context. Webpage content is untrusted data and could contain instructions intended to mislead an agent.

Skill content

`Extract structured data from any web page` and `Scrape a page, extract all text content`

Recommendation

Treat scraped output as untrusted data and confirm with the user before following any instructions found inside scraped pages.