Robots.txt for AI Crawlers: Complete Guide | CrawlReady AI

How to configure robots.txt for GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and traditional search bots.

robots.txt is a public file — not authentication. It instructs well-behaved crawlers which URLs they may fetch. AI-era SEO requires explicit rules for both search bots and AI agents.

Major AI crawler user-agents

  • GPTBot — OpenAI
  • OAI-SearchBot — OpenAI search
  • ClaudeBot / Claude-Web — Anthropic
  • PerplexityBot — Perplexity
  • Google-Extended — Google generative AI training opt-out token
  • CCBot — Common Crawl

Recommended pattern for public marketing sites

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /dashboard/

Sitemap: https://example.com/sitemap.xml

Generate from your URL

Enter your site in the Robots.txt Generator — we detect your sitemap and produce a starter file with AI crawler blocks.

Frequently Asked Questions

Can robots.txt block AI training crawlers only?

Yes. Use separate User-agent blocks for GPTBot, ClaudeBot, Google-Extended, and CCBot while keeping Googlebot allowed for search.

Does Allow: / override Disallow rules?

Under Google's robots.txt interpretation, the most specific matching rule wins. Put Allow: / first, then Disallow for private paths like /admin/.

Important disclaimer

This guide is for educational purposes only. No tool or technique guarantees search rankings, AI inclusion, or specific traffic results. Refer to official documentation from search engines and AI providers for current policies.

Related tools

Related guides