Scrape

PassAudited by ClawScan on May 1, 2026.

Overview

This is a documentation-only web-scraping guide that is mostly coherent and safety-oriented, with minor cautions around arbitrary URL fetching, fail-open robots.txt handling, and audit logging.

This skill appears safe to install as guidance-only material. Use it only for authorized public scraping, prefer official APIs, fail closed or ask when compliance checks cannot be completed, and avoid storing personal data or sensitive URLs in logs.

Findings (2)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

The agent could help create scraping code that sends requests to websites, and may proceed when robots.txt cannot be verified unless the user tightens the logic.

Why it was flagged

The example can fetch arbitrary user-supplied URLs and treats any robots.txt read exception as allowed. This is central to a scraping skill, but users should keep target selection and compliance checks under explicit control.

Skill content
except Exception:
        return True  # No robots.txt = allowed
...
response = session.get(url)
Recommendation

Only scrape public, authorized targets; prefer official APIs; respect terms and robots.txt; and consider failing closed or asking the user when robots.txt cannot be retrieved.

What this means

Scraping logs may retain sensitive URL details longer than intended.

Why it was flagged

The example logs scrape URLs and statuses for an audit trail. This is purpose-aligned, but URLs can contain query strings, identifiers, or other sensitive data if users scrape poorly scoped targets.

Skill content
logger.info(f"SCRAPE url={url} status={response.status_code}")
Recommendation

Avoid placing personal data in URLs, redact query strings when logging, and set clear retention/deletion rules for scrape audit logs.