Crawler

v3.0.0

Web crawling and scraping reference — robots.txt protocol, Scrapy framework, anti-bot detection, headless browsers, and legal considerations

0· 827· 10 versions· 3 current· 3 all-time· Updated 20h ago· MIT-0

Install

openclaw skills install crawler

Crawler

Web crawling and scraping reference — robots.txt protocol, Scrapy framework, anti-bot detection, headless browsers, and legal considerations. No API keys or credentials required — outputs reference documentation only.

Commands

CommandDescription
introCrawling vs scraping, robots.txt, sitemap
standardsHTTP caching, structured data, meta tags
troubleshootingAnti-bot detection, JS rendering, encoding
performanceConcurrency, dedup, incremental, distributed
securityLegal landscape, ethical guidelines, proxies
migrationBeautifulSoup to Scrapy, requests to Playwright
cheatsheetScrapy commands, CSS/XPath, curl, user-agents
faqLegality, JS pages, blocking, storage

Output Format

All commands output plain-text reference documentation via heredoc. No external API calls, no credentials needed, no network access.


Powered by BytesAgain | bytesagain.com | hello@bytesagain.com

Version tags

latestvk977xm5khc7xs31gj0ftfvf8q983f1y4