Crawler

Web crawling and scraping reference — robots.txt protocol, Scrapy framework, anti-bot detection, headless browsers, and legal considerations

bytesagain3@bytesagain3

Install

openclaw skills install @bytesagain3/crawler

Crawler

Web crawling and scraping reference — robots.txt protocol, Scrapy framework, anti-bot detection, headless browsers, and legal considerations. No API keys or credentials required — outputs reference documentation only.

Commands

Command	Description
`intro`	Crawling vs scraping, robots.txt, sitemap
`standards`	HTTP caching, structured data, meta tags
`troubleshooting`	Anti-bot detection, JS rendering, encoding
`performance`	Concurrency, dedup, incremental, distributed
`security`	Legal landscape, ethical guidelines, proxies
`migration`	BeautifulSoup to Scrapy, requests to Playwright
`cheatsheet`	Scrapy commands, CSS/XPath, curl, user-agents
`faq`	Legality, JS pages, blocking, storage

Output Format

All commands output plain-text reference documentation via heredoc. No external API calls, no credentials needed, no network access.

Powered by BytesAgain | bytesagain.com | hello@bytesagain.com

Crawler

Install

Crawler

Commands

Output Format

Related skills