{"skill":{"slug":"robots-txt-gen","displayName":"Robots.txt Generator","summary":"Generate, validate, and analyze robots.txt files for websites. Use when creating robots.txt from scratch, validating existing robots.txt syntax, checking if...","description":"---\nname: robots-txt-gen\ndescription: Generate, validate, and analyze robots.txt files for websites. Use when creating robots.txt from scratch, validating existing robots.txt syntax, checking if a URL is allowed/blocked by robots.txt rules, or generating robots.txt for common platforms (WordPress, Next.js, Django, Rails). Also use when auditing crawl directives or debugging search engine indexing issues.\n---\n\n# robots-txt-gen\n\nGenerate, validate, and test robots.txt files from the command line.\n\n## Quick Start\n\n```bash\n# Generate a robots.txt for a platform\npython3 scripts/robots_txt_gen.py generate --preset nextjs --sitemap https://example.com/sitemap.xml\n\n# Validate an existing robots.txt\npython3 scripts/robots_txt_gen.py validate --file robots.txt\n\n# Validate a remote robots.txt\npython3 scripts/robots_txt_gen.py validate --url https://example.com/robots.txt\n\n# Test if a URL is allowed for a user-agent\npython3 scripts/robots_txt_gen.py test --file robots.txt --url /admin/dashboard --agent Googlebot\n\n# Generate with custom rules\npython3 scripts/robots_txt_gen.py generate --allow \"/\" --disallow \"/admin\" --disallow \"/api\" --disallow \"/private\" --sitemap https://example.com/sitemap.xml --agent \"*\"\n```\n\n## Commands\n\n### `generate`\nCreate a robots.txt file with custom rules or platform presets.\n\nOptions:\n- `--preset <name>` — Use a platform preset: `wordpress`, `nextjs`, `django`, `rails`, `laravel`, `static`, `spa`, `ecommerce`\n- `--agent <name>` — User-agent (default: `*`). Repeat for multiple agents.\n- `--allow <path>` — Allow path. Repeatable.\n- `--disallow <path>` — Disallow path. Repeatable.\n- `--sitemap <url>` — Sitemap URL. Repeatable.\n- `--crawl-delay <seconds>` — Crawl delay directive.\n- `--block-ai` — Add rules to block common AI crawlers (GPTBot, ChatGPT-User, CCBot, Google-Extended, anthropic-ai, etc.)\n- `--output <file>` — Write to file instead of stdout.\n\n### `validate`\nCheck a robots.txt file for syntax errors and best-practice warnings.\n\nOptions:\n- `--file <path>` — Local file to validate.\n- `--url <url>` — Remote robots.txt URL to fetch and validate.\n\n### `test`\nTest whether a specific URL path is allowed or disallowed for a given user-agent.\n\nOptions:\n- `--file <path>` — robots.txt file to test against.\n- `--url <path>` — URL path to test (e.g., `/admin/login`).\n- `--agent <name>` — User-agent to test as (default: `Googlebot`).\n\n## Platform Presets\n\n| Preset | What it blocks | Notes |\n|--------|---------------|-------|\n| `wordpress` | `/wp-admin/`, `/wp-includes/`, query params | Allows `/wp-admin/admin-ajax.php` |\n| `nextjs` | `/_next/static/`, `/api/`, `/.next/` | Standard Next.js paths |\n| `django` | `/admin/`, `/static/admin/`, `/media/private/` | Django admin and private media |\n| `rails` | `/admin/`, `/assets/`, `/tmp/` | Rails conventions |\n| `laravel` | `/admin/`, `/storage/`, `/vendor/` | Laravel conventions |\n| `static` | Nothing blocked | Simple allow-all with sitemap |\n| `spa` | `/api/`, `/assets/` | Single-page app pattern |\n| `ecommerce` | `/cart/`, `/checkout/`, `/account/`, `/search?` | Prevents crawling user sessions |\n\n## AI Crawler Blocking\n\nThe `--block-ai` flag adds disallow rules for known AI training crawlers:\n- GPTBot, ChatGPT-User (OpenAI)\n- Google-Extended (Google AI)\n- CCBot (Common Crawl)\n- anthropic-ai (Anthropic)\n- Bytespider (ByteDance)\n- ClaudeBot (Anthropic)\n- FacebookBot (Meta)\n","topics":["Debugging","Crawl"],"tags":{"crawler":"1.0.0","latest":"1.0.0","robots":"1.0.0","seo":"1.0.0","web":"1.0.0"},"stats":{"comments":0,"downloads":654,"installsAllTime":24,"installsCurrent":0,"stars":0,"versions":1},"createdAt":1773439900884,"updatedAt":1778999207534},"latestVersion":{"version":"1.0.0","createdAt":1773439900884,"changelog":"- Initial release of robots-txt-gen.\n- Generate, validate, and analyze robots.txt files via command line.\n- Supports platform presets for WordPress, Next.js, Django, Rails, Laravel, static sites, SPAs, and ecommerce.\n- Validate both local and remote robots.txt files for syntax and warnings.\n- Test if URLs are allowed or blocked for specific user-agents.\n- Optionally generate rules to block known AI crawlers with the --block-ai flag.","license":"MIT-0"},"metadata":null,"owner":{"handle":"johnnywang2001","userId":"s174ntk6h18bvsev091zn78ex583gr1h","displayName":"John Wang","image":"https://avatars.githubusercontent.com/u/20619402?v=4"},"moderation":null}