Skillv0.1.0

ClawScan security

LLMs.txt Generator · ClawHub's context-aware review of the artifact, metadata, and declared behavior.

Scanner verdict

SuspiciousFeb 28, 2026, 7:12 AM

Verdict: suspicious
Confidence: medium
Model: gpt-5-mini
Summary: The skill largely does what it says (crawls a site and builds llms.txt), but there are incongruities around how it's executed and its dependencies that deserve review before running.
Guidance: This skill appears to implement the described crawler and llms.txt generation, but before running it you should: (1) review the crawl.py source yourself (it only issues HTTP GETs and parses HTML, but it extracts emails and page text), (2) note that dependencies (httpx, beautifulsoup4, lxml) are required but not installed by the registry — either run it in a controlled virtualenv or provide the packages, (3) the SKILL.md hardcodes a virtualenv/workspace path that may not exist — adjust the invocation to your environment, (4) avoid asking it to crawl sensitive internal URLs unless you trust the environment (the crawler will fetch any URL you give it), and (5) consider running the skill in a sandboxed environment or with restricted network access until you're comfortable with its behavior.

Review Dimensions

Purpose & Capability: noteName/description match the included code: scripts/crawl.py implements a 2-level crawler and extraction heuristics consistent with generating an llms.txt. However, SKILL.md hardcodes a Python virtualenv path (~/.virtualenvs/llms-txt-generator/bin/python3) and a workspace path (~/.openclaw/workspace/llms-txt-generator/scripts/crawl.py) even though the skill declares no required binaries or install steps — this mismatch is unexpected.
Instruction Scope: noteInstructions restrict actions to crawling the user-provided site and re-crawling extra URLs, producing /tmp/llms_business_info.json and conversational gap-filling. The crawler extracts emails and raw page text (including up to 8000 chars in deep mode). This is within the stated purpose, but extracting emails/raw text is sensitive and the skill will fetch any URLs the user (or agent) supplies, which could reach internal endpoints if given.
Install Mechanism: concernThere is no install spec despite the code requiring Python packages (httpx, beautifulsoup4, lxml). The SKILL.md invokes a specific virtualenv path that is not provisioned by the registry metadata. That mismatch means the runtime may fail or an operator might create the virtualenv themselves (with attendant trust concerns). No external downloads or obscure URLs are used in the code, which is good, but dependency handling is underspecified.
Credentials: noteThe skill requests no environment variables or credentials, which aligns with its stated purpose. It does extract email addresses and other public content from crawled pages; including emails in generated llms.txt is consistent with the referenced spec, but users should be aware public email addresses found by the crawler will be surfaced in output.
Persistence & Privilege: okalways is false and the skill doesn't request persistent system-wide privileges. It writes to /tmp/llms_business_info.json (transient) and reads/writes only its own workspace/script — no evidence it alters other skills or global config.