Robots Ai

Other

Analyze and generate robots.txt files with AI crawler awareness. Detect which AI bots (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, etc.) are blocked or allowed on any website.

Install

openclaw skills install robotstxt-ai

robots-ai

Analyze, audit, and generate robots.txt files with full awareness of 20+ AI crawlers.

Capabilities

  • Analyze any website's robots.txt to see which AI bots are blocked/allowed
  • Generate a robots.txt with toggleable AI bot blocking
  • Audit existing robots.txt for completeness and issues
  • List all known AI crawlers with their user-agents, companies, and documentation links

AI Bots Database

You know about these AI crawlers and their user-agents:

BotUser-AgentCompanyType
GPTBotGPTBotOpenAIAI Crawler
ChatGPT-UserChatGPT-UserOpenAIAI Search
OAI-SearchBotOAI-SearchBotOpenAIAI Search
ClaudeBotClaudeBotAnthropicAI Crawler
anthropic-aianthropic-aiAnthropicAI Crawler
Google-ExtendedGoogle-ExtendedGoogleAI Crawler
PerplexityBotPerplexityBotPerplexityAI Search
CCBotCCBotCommon CrawlAI Crawler
BytespiderBytespiderByteDanceAI Crawler
DiffbotDiffbotDiffbotAI Crawler
cohere-aicohere-aiCohereAI Crawler
AmazonbotAmazonbotAmazonAI Crawler
Meta-ExternalAgentMeta-ExternalAgentMetaAI Crawler
Meta-ExternalFetcherMeta-ExternalFetcherMetaAI Crawler
Applebot-ExtendedApplebot-ExtendedAppleAI Crawler
YouBotYouBotYou.comAI Search
TimpibotTimpibotTimpiAI Crawler
img2datasetimg2datasetOpen SourceAI Crawler

Important Notes

  • Google-Extended controls Gemini training access but does NOT affect Google Search indexing
  • Blocking Googlebot removes the site from Google Search entirely — never do this unless explicitly asked
  • CCBot feeds Common Crawl, which is used by many AI companies for training data
  • Bytespider (ByteDance) and Timpibot are commonly blocked by default due to aggressive crawling

How to Analyze

When asked to analyze a robots.txt:

  1. Fetch the robots.txt from the URL (append /robots.txt if not included)
  2. Parse all User-agent directives and their Allow/Disallow rules
  3. Check each AI bot against the rules
  4. Report: which bots are blocked, which are allowed, and any issues found
  5. Suggest improvements if relevant

How to Generate

When asked to generate a robots.txt:

  1. Ask which AI bots to block (or accept "block all AI" / "allow all AI")
  2. Ask for sitemap URL(s)
  3. Ask for any custom rules (e.g., Disallow: /admin/)
  4. Generate clean robots.txt with comments explaining each section
  5. Always include User-agent: * with Allow: / as the default
  6. Group blocked AI bots together with comments
  7. Add sitemap directives at the end

Output Format

Always format the generated robots.txt in a code block with syntax highlighting. Add comments explaining what each section does. Example:

# Allow all crawlers by default
User-agent: *
Allow: /

# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

# Sitemap
Sitemap: https://example.com/sitemap.xml