Robots.txt Generator

Generate, validate, and analyze robots.txt files for websites. Use when creating robots.txt from scratch, validating existing robots.txt syntax, checking if a URL is allowed/blocked by robots.txt rules, or generating robots.txt for common platforms (WordPress, Next.js, Django, Rails). Also use when auditing crawl directives or debugging search engine indexing issues.

John Wang@johnnywang2001

Install

openclaw skills install @johnnywang2001/robots-txt-gen

robots-txt-gen

Generate, validate, and test robots.txt files from the command line.

Quick Start

bash

# Generate a robots.txt for a platform
python3 scripts/robots_txt_gen.py generate --preset nextjs --sitemap https://example.com/sitemap.xml

# Validate an existing robots.txt
python3 scripts/robots_txt_gen.py validate --file robots.txt

# Validate a remote robots.txt
python3 scripts/robots_txt_gen.py validate --url https://example.com/robots.txt

# Test if a URL is allowed for a user-agent
python3 scripts/robots_txt_gen.py test --file robots.txt --url /admin/dashboard --agent Googlebot

# Generate with custom rules
python3 scripts/robots_txt_gen.py generate --allow "/" --disallow "/admin" --disallow "/api" --disallow "/private" --sitemap https://example.com/sitemap.xml --agent "*"

Commands

`generate`

Create a robots.txt file with custom rules or platform presets.

Options:

--preset <name> — Use a platform preset: wordpress, nextjs, django, rails, laravel, static, spa, ecommerce
--agent <name> — User-agent (default: *). Repeat for multiple agents.
--allow <path> — Allow path. Repeatable.
--disallow <path> — Disallow path. Repeatable.
--sitemap <url> — Sitemap URL. Repeatable.
--crawl-delay <seconds> — Crawl delay directive.
--block-ai — Add rules to block common AI crawlers (GPTBot, ChatGPT-User, CCBot, Google-Extended, anthropic-ai, etc.)
--output <file> — Write to file instead of stdout.

`validate`

Check a robots.txt file for syntax errors and best-practice warnings.

Options:

--file <path> — Local file to validate.
--url <url> — Remote robots.txt URL to fetch and validate.

`test`

Test whether a specific URL path is allowed or disallowed for a given user-agent.

Options:

--file <path> — robots.txt file to test against.
--url <path> — URL path to test (e.g., /admin/login).
--agent <name> — User-agent to test as (default: Googlebot).

Platform Presets

Preset	What it blocks	Notes
`wordpress`	`/wp-admin/`, `/wp-includes/`, query params	Allows `/wp-admin/admin-ajax.php`
`nextjs`	`/_next/static/`, `/api/`, `/.next/`	Standard Next.js paths
`django`	`/admin/`, `/static/admin/`, `/media/private/`	Django admin and private media
`rails`	`/admin/`, `/assets/`, `/tmp/`	Rails conventions
`laravel`	`/admin/`, `/storage/`, `/vendor/`	Laravel conventions
`static`	Nothing blocked	Simple allow-all with sitemap
`spa`	`/api/`, `/assets/`	Single-page app pattern
`ecommerce`	`/cart/`, `/checkout/`, `/account/`, `/search?`	Prevents crawling user sessions

AI Crawler Blocking

The --block-ai flag adds disallow rules for known AI training crawlers:

GPTBot, ChatGPT-User (OpenAI)
Google-Extended (Google AI)
CCBot (Common Crawl)
anthropic-ai (Anthropic)
Bytespider (ByteDance)
ClaudeBot (Anthropic)
FacebookBot (Meta)

Robots.txt Generator

Install

robots-txt-gen

Quick Start

Commands

generate

validate

test

Platform Presets

AI Crawler Blocking

Related skills

`generate`

`validate`

`test`