Free AI crawl tool

Perplexity Crawler Checker

Perplexity uses PerplexityBot to fetch pages for AI answers. Enter your URL and we show whether robots.txt allows PerplexityBot on your homepage and key paths.

Recommended OAI-SearchBot: Allow / for public content GPTBot (training): Allow or block based on your policy llms.txt links: 10–30 curated pages

Website URL to analyze

Recommended actions

Prioritised fixes — highest discoverability impact first

About this tool

Perplexity uses PerplexityBot to fetch pages for AI answers. Enter your URL and we show whether robots.txt allows PerplexityBot on your homepage and key paths.

See whether PerplexityBot is allowed in robots.txt and can fetch your public pages for Perplexity AI answers.

What we check

In your free report

robots.txt rules for Googlebot, Bingbot, GPTBot, OAI-SearchBot, ClaudeBot, and PerplexityBot
Homepage HTTP status and whether the path is allowed for each user-agent
llms.txt presence and basic structure at your domain root
XML sitemap discovery via robots.txt and common locations
Meta robots, canonical, and indexability signals on scanned pages

Recommended OAI-SearchBot Allow / for public content

GPTBot (training) Allow or block based on your policy

llms.txt links 10–30 curated pages

Best practices

Recommendations from official docs

Control AI bots independently

OpenAI documents GPTBot (training) and OAI-SearchBot (ChatGPT search) as separate robots.txt tokens. You can allow search indexing while opting out of training data collection.

Allow search bots on public pages

Google and Bing need crawl access to index your site. Block only paths that should never appear in search or AI answers (admin, cart, staging, private APIs).

Publish llms.txt for AI discovery

A curated llms.txt at your root helps LLMs understand your site faster than parsing your full sitemap. Keep it under ~2,000 words with 10–30 high-value links.

Verify WAF and CDN bot rules

robots.txt is advisory. Cloudflare Bot Fight Mode or aggressive WAF rules can block crawlers even when robots.txt allows them — whitelist official crawler IP ranges.

How to fix

Step-by-step action plan

1 Fetch your live robots.txt and list each User-agent block separately for AI and search bots.
2 Allow OAI-SearchBot and Googlebot on public marketing and product pages you want discovered.
3 Add Sitemap: https://yourdomain.com/sitemap.xml and ensure the sitemap returns HTTP 200.
4 Publish llms.txt with one H1, a blockquote summary, and grouped absolute URLs.
5 Re-run this checker after changes — OpenAI notes robots.txt updates can take ~24 hours to apply.

Official references

Learn more from the source

Related tools

Fix what this scan finds

Run full-site audit (up to 100 pages)

Need help fixing AI SEO issues?

Optimize your website for AI SEO, ChatGPT Search, Google AI Overviews, GPTBot, llms.txt, robots.txt, schema, and crawlability.

Hire Mubashir to fix crawl issues

Questions about this tool

What is PerplexityBot?

Perplexity's robots.txt user-agent for fetching pages used in Perplexity AI search answers.

How do I allow PerplexityBot?

Ensure robots.txt does not Disallow: / for PerplexityBot on public pages you want cited.