Sitemap Content Scraper

v1.0.2

Discover website sitemaps from robots.txt and common sitemap locations, choose the right sitemap or content family such as docs, blog, help center, academy,...

2· 73·0 current·0 all-time
bygunes alcan@quareth
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description match the included Python scripts (discover_sitemaps.py and scrape_sitemap.py). Required runtime (python3) and no credentials/config paths are consistent with a public-site sitemap discovery and scraping tool.
Instruction Scope
SKILL.md restricts activity to public http/https targets and instructs running the included scripts; the scripts perform network requests and write files to a user-specified output directory as expected. The SKILL.md guardrails (reject localhost/private IPs, avoid auth/cookies, ask before writing outside working area) align with the script behavior.
Install Mechanism
No install spec (instruction-only) and only a dependency on python3. The skill bundles the scraper scripts rather than downloading external code at runtime, avoiding high-risk remote installs.
Credentials
No environment variables, credentials, or unrelated binaries are requested. The scripts access network and local filesystem as required by a scraper; nothing asks for unrelated secrets or broad system config access.
Persistence & Privilege
The skill is user-invocable and not always-enabled; it does not request persistent privileges or attempt to modify other skills or global agent configuration.
Assessment
This skill appears to do what it says: it will run the included Python scripts to discover public sitemaps and fetch pages, then write Markdown files to the destination folder you choose. Before running: (1) inspect the bundled scripts (they are included) and run them in a sandbox or container if you are cautious; (2) only target public http/https hosts and avoid internal/private hostnames as advised; (3) choose an output directory you control and confirm the agent asks before writing outside that area; (4) be aware the scraper performs arbitrary HTTP requests (so don't point it at services where requests could trigger actions or costs).

Like a lobster shell, security has layers — review code before you run it.

latestvk9702j0mpsmaz0v71hff99vc8x83tg8h

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

Binspython3

Comments