Smart Web Scraper
PassAudited by ClawScan on May 1, 2026.
Overview
The skill appears to be a straightforward web scraper, with disclosed web-crawling and runtime dependency behavior that users should use responsibly.
This skill looks safe for normal, user-directed scraping of public webpages. Before installing, be aware that it may fetch third-party Python packages at runtime and can crawl websites automatically; use conservative page limits, respect robots.txt and site terms, and treat all scraped page content as untrusted data.
Findings (3)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
If used carelessly, it could send automated traffic to websites or collect data in ways that violate site rules.
The skill can make repeated automated web requests and includes a user-controlled robots.txt override. This is disclosed, bounded by a page count, and central to the scraper purpose.
`crawl` | `<url> --pages N ...` ... `Default: 1 request per second` ... `Respects robots.txt by default (override with --ignore-robots)`
Use it only on sites you are allowed to scrape, keep the default robots.txt and delay behavior unless you understand the impact, and set conservative page limits.
The command may install current versions of third-party packages from the configured package index.
The documented run method resolves Python dependencies at runtime without pinned versions. This is a normal setup pattern for this purpose, but users should notice the dependency source.
`uv run --with beautifulsoup4 --with lxml python scripts/scraper.py ...` and `Requires beautifulsoup4 and lxml (auto-installed by uv run --with)`
Run it in an isolated environment and pin or review dependency versions if you need reproducible or higher-assurance installs.
A scraped page could contain text that tries to influence the agent if the agent treats extracted content as instructions instead of data.
The skill intentionally brings arbitrary webpage text into the agent's working context. Webpage content is untrusted data and could contain instructions intended to mislead an agent.
`Extract structured data from any web page` and `Scrape a page, extract all text content`
Treat scraped output as untrusted data and confirm with the user before following any instructions found inside scraped pages.
