robots-txt

v1.1.1

When the user wants to configure, audit, or optimize robots.txt. Also use when the user mentions "robots.txt," "crawler rules," "block crawlers," "AI crawler...

0· 57·0 current·0 all-time
byKostja Zhang@kostja94
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (robots.txt auditing/config) match the instructions: generate recommended files, audit for accidental blocks, and give per-user-agent rules. It does not request unrelated credentials or tooling.
Instruction Scope
Instructions are narrowly scoped to robots.txt guidance and auditing. They do tell the agent to read .claude/project-context.md or .cursor/project-context.md (if present) to obtain site URL/indexing goals — reasonable for context but the agent will read local project files, so users should be aware that any content in those files will be consumed.
Install Mechanism
No install spec and no code files — instruction-only skill. Nothing is written to disk or downloaded during install.
Credentials
The skill requires no environment variables, secrets, or external credentials. The requested access (local project-context files) is proportionate to the stated purpose.
Persistence & Privilege
always is false and the skill does not request elevated/system-wide privileges. Autonomous invocation is allowed (platform default) but not itself a problem here.
Assessment
This skill is internally consistent and low-risk: it only contains prose instructions and needs no installs or credentials. Before installing, note that at runtime it may read project-context files (.claude/project-context.md or .cursor/project-context.md) if present to learn the site URL and indexing goals — remove or sanitize those files if they contain sensitive data you don't want shared. If you plan to audit a live site, the skill does not gain host credentials or make changes itself; provide robots.txt contents or site details explicitly when asked rather than granting broad access to production systems. If you prefer to avoid any automatic local-file reads, disable autonomous invocation for the skill or remove project-context files from the agent environment.

Like a lobster shell, security has layers — review code before you run it.

latestvk970cym62qnzgtzn9npm8d6wqs84da7k
57downloads
0stars
1versions
Updated 1w ago
v1.1.1
MIT-0

SEO Technical: robots.txt

Guides configuration and auditing of robots.txt for search engine and AI crawler control.

When invoking: On first use, if helpful, open with 1–2 sentences on what this skill covers and why it matters, then provide the main output. On subsequent use or when the user asks to skip, go directly to the main output.

Scope (Technical SEO)

  • Robots.txt: Configure Disallow/Allow, Sitemap, Clean-param; audit for accidental blocks
  • Crawler access: Path-level crawl control; AI crawler allow/block strategy
  • Differentiation: robots.txt = crawl control (who accesses what paths); noindex = index control (what gets indexed). See indexing for page-level exclusions.

Initial Assessment

Check for project context first: If .claude/project-context.md or .cursor/project-context.md exists, read it for site URL and indexing goals.

Identify:

  1. Site URL: Base domain (e.g., https://example.com)
  2. Indexing scope: Full site, partial, or specific paths to exclude
  3. AI crawler strategy: Allow search/indexing vs. block training data crawlers

Best Practices

Purpose and Limitations

PointNote
PurposeControls crawler access; does NOT prevent indexing (disallowed URLs may still appear in search without snippet)
AdvisoryRules are advisory; malicious crawlers may ignore
Publicrobots.txt is publicly readable; use noindex or auth for sensitive content. See indexing

Crawl vs Index vs Link Equity (Quick Reference)

ToolControlsPrevents indexing?
robots.txtCrawl (path-level)No—blocked URLs may still appear in SERP
noindex (meta / X-Robots-Tag)Index (page-level)Yes. See indexing
nofollowLink equity onlyNo—does not control indexing

When to Use robots.txt vs noindex

UseToolExample
Path-level (whole directory)robots.txtDisallow: /admin/, Disallow: /api/, Disallow: /staging/
Page-level (specific pages)noindex meta / X-Robots-TagLogin, signup, thank-you, 404, legal. See indexing for full list
CriticalDo NOT block in robots.txtPages that use noindex—crawlers must access the page to read the directive

Paths to block in robots.txt: /admin/, /api/, /staging/, temp files. Paths to use noindex (allow crawl): /login/, /signup/, /thank-you/, etc.—see indexing.

Location and Format

ItemRequirement
PathSite root: https://example.com/robots.txt
EncodingUTF-8 plain text
StandardRFC 9309 (Robots Exclusion Protocol)

Core Directives

DirectivePurposeExample
User-agent:Target crawlerUser-agent: Googlebot, User-agent: *
Disallow:Block path prefixDisallow: /admin/
Allow:Allow path (can override Disallow)Allow: /public/
Sitemap:Declare sitemap absolute URLSitemap: https://example.com/sitemap.xml
Clean-param:Strip query params (Yandex)See below

Critical: Do Not Block

Do not blockReason
CSS, JS, imagesGoogle needs them to render pages; blocking breaks indexing
/_next/ (Next.js)Breaks CSS/JS loading; static assets in GSC "Crawled - not indexed" is expected. See indexing
Pages that use noindexCrawlers must access the page to read the noindex directive; blocking in robots.txt prevents that

Only block: paths that don't need crawling: /admin/, /api/, /staging/, temp files.

AI Crawler Strategy

robots.txt is effective for all measured AI crawlers (Vercel/MERJ study, 2024). Set rules per user-agent; check each vendor's docs for current tokens.

User-agentPurposeTypical
OAI-SearchBotChatGPT searchAllow
GPTBotOpenAI trainingDisallow
Claude-SearchBotClaude searchAllow
ClaudeBotAnthropic trainingDisallow
PerplexityBotPerplexity searchAllow
Google-ExtendedGemini trainingDisallow
CCBotCommon Crawl (LLM training)Disallow
BytespiderByteDanceDisallow
Meta-ExternalAgentMetaDisallow
AppleBotApple (Siri, Spotlight); renders JSAllow for indexing

Allow vs Disallow: Allow search/indexing bots (OAI-SearchBot, Claude-SearchBot, PerplexityBot); Disallow training-only bots (GPTBot, ClaudeBot, CCBot) if you don't want content used for model training. See site-crawlability for AI crawler optimization (SSR, URL management).

Clean-param (Yandex)

Clean-param: utm_source&utm_medium&utm_campaign&utm_term&utm_content&ref&fbclid&gclid

Output Format

  • Current state (if auditing)
  • Recommended robots.txt (full file)
  • Compliance checklist
  • References: Google robots.txt

Related Skills

  • indexing: Full noindex page-type list; when to use noindex vs robots.txt; GSC indexing diagnosis
  • page-metadata: Meta robots (noindex, nofollow) implementation
  • xml-sitemap: Sitemap URL to reference in robots.txt
  • site-crawlability: Broader crawl and structure guidance; AI crawler optimization
  • rendering-strategies: SSR, SSG, CSR; content in initial HTML for crawlers

Comments

Loading comments...