Skip to main content
Free AI crawl tool

Perplexity Crawler Checker

Perplexity uses PerplexityBot to fetch pages for AI answers. Enter your URL and we show whether robots.txt allows PerplexityBot on your homepage and key paths.

Recommended OAI-SearchBot: Allow / for public content GPTBot (training): Allow or block based on your policy llms.txt links: 10–30 curated pages

Website URL to analyze

Perplexity uses PerplexityBot to fetch pages for AI answers. Enter your URL and we show whether robots.txt allows PerplexityBot on your homepage and key paths.

See whether PerplexityBot is allowed in robots.txt and can fetch your public pages for Perplexity AI answers.

In your free report

  • robots.txt rules for Googlebot, Bingbot, GPTBot, OAI-SearchBot, ClaudeBot, and PerplexityBot
  • Homepage HTTP status and whether the path is allowed for each user-agent
  • llms.txt presence and basic structure at your domain root
  • XML sitemap discovery via robots.txt and common locations
  • Meta robots, canonical, and indexability signals on scanned pages
Recommended OAI-SearchBot Allow / for public content
GPTBot (training) Allow or block based on your policy
llms.txt links 10–30 curated pages

Recommendations from official docs

Control AI bots independently

OpenAI documents GPTBot (training) and OAI-SearchBot (ChatGPT search) as separate robots.txt tokens. You can allow search indexing while opting out of training data collection.

Allow search bots on public pages

Google and Bing need crawl access to index your site. Block only paths that should never appear in search or AI answers (admin, cart, staging, private APIs).

Publish llms.txt for AI discovery

A curated llms.txt at your root helps LLMs understand your site faster than parsing your full sitemap. Keep it under ~2,000 words with 10–30 high-value links.

Verify WAF and CDN bot rules

robots.txt is advisory. Cloudflare Bot Fight Mode or aggressive WAF rules can block crawlers even when robots.txt allows them — whitelist official crawler IP ranges.

Step-by-step action plan

  1. 1 Fetch your live robots.txt and list each User-agent block separately for AI and search bots.
  2. 2 Allow OAI-SearchBot and Googlebot on public marketing and product pages you want discovered.
  3. 3 Add Sitemap: https://yourdomain.com/sitemap.xml and ensure the sitemap returns HTTP 200.
  4. 4 Publish llms.txt with one H1, a blockquote summary, and grouped absolute URLs.
  5. 5 Re-run this checker after changes — OpenAI notes robots.txt updates can take ~24 hours to apply.

Need help fixing AI SEO issues?

Optimize your website for AI SEO, ChatGPT Search, Google AI Overviews, GPTBot, llms.txt, robots.txt, schema, and crawlability.

Hire Mubashir to fix crawl issues

Questions about this tool

What is PerplexityBot?

Perplexity's robots.txt user-agent for fetching pages used in Perplexity AI search answers.

How do I allow PerplexityBot?

Ensure robots.txt does not Disallow: / for PerplexityBot on public pages you want cited.