AI Bot Checker: Which AI Bots Have Access to Your Website?
GPTBot, ClaudeBot, Bytespider – AI systems crawl millions of websites daily to train their models or answer user queries. The question is: Have you consciously allowed this access, or is it happening without your knowledge?
This tool checks your robots.txt against 30 known AI bots and additionally detects whether your website signals an opt-out via meta robots tags (noai, noimageai). The result: a clear overview with charts showing which AI providers can access your content – and which cannot.
AI Bot Access Checker
Enter a domain to fetch and analyze its robots.txt for AI bot access.
Path to test (e.g., /, /blog/, /api/). Defaults to /.
Test a custom user-agent string alongside known AI bots.
Paste robots.txt content to test AI bot access.
Path to test (e.g., /, /blog/, /api/). Defaults to /.
Test a custom user-agent string alongside known AI bots.
How the AI Bot Checker Works
- Enter your domain: Enter your domain. The tool automatically loads your robots.txt and scans your homepage for meta robots tags.
- Test custom rules: Alternatively, paste robots.txt content directly to test changes before deploying them.
- Review results: For each of the 30 AI bots, you’ll see the status (allowed/blocked), user agent string, description, and category (search, training, image recognition).
- Analyze the chart: The donut chart and vendor overview show the ratio of allowed to blocked bots at a glance – broken down by provider (OpenAI, Anthropic, Google, Meta, Apple, and more).
- Check meta robots signals: The tool detects
noaiandnoimageaimeta tags as well asX-Robots-Tagheaders on your homepage.
Changelog
- Expanded bot database from 16 to 30 known AI bots (including Apple, Brave, Perplexity, Meta)
- Added donut chart and per-vendor stacked bar chart for visual analysis
- Implemented meta robots detection: scans
<meta name="robots">, bot-specific meta tags, andX-Robots-TagHTTP headers for noai/noimageai - Added bot categorization by purpose (search, training, image recognition)
- Custom user agent input for individual bot testing
Background: Why You Should Control AI Bot Access
AI companies use web crawlers to collect training data or provide real-time information. This typically happens without explicit consent from website operators. The robots.txt is currently the primary mechanism to control this access.
Two Types of AI Bots
- Search and answer bots: These bots (e.g., ChatGPT-User, PerplexityBot) fetch content to answer user queries in real time. Blocking them means your content won’t appear in AI-powered search results.
- Training bots: These bots (e.g., GPTBot, ClaudeBot) collect data for training language models. This raises the fundamental question of whether your content should be used to train third-party AI systems.
robots.txt vs. Meta Robots: What’s the Difference?
The robots.txt blocks crawling at the directory level. Meta robots tags like noai signal at the page level that the content should not be used for AI training. Both mechanisms complement each other but are not legally binding – they rely on voluntary compliance by AI providers.
What Should You Do With the Results?
- Make a conscious decision: Consider which AI bots you want to allow. Search bots drive traffic; training bots use your content without direct return.
- Update your robots.txt: Add
Disallow: /rules under the relevant user agent for bots you want to block. - Add meta tags: If you want to signal a general opt-out, add
<meta name="robots" content="noai, noimageai">to the<head>section of your pages. - Check regularly: The AI landscape changes quickly. New bots appear regularly – review your settings at regular intervals.
Frequently Asked Questions
Which AI bots does the tool check?
The tool checks 30 known AI bots, including GPTBot and ChatGPT-User (OpenAI), ClaudeBot (Anthropic), Bytespider (ByteDance), Google-Extended (Google), FacebookBot (Meta), Applebot-Extended (Apple), PerplexityBot, and more.
Can I block all AI bots at once?
There is no single user agent for “all AI bots.” You need to block each bot individually in your robots.txt. This tool shows you all relevant user agent strings so you can create the appropriate rules.
Do all AI bots respect robots.txt?
Most major providers (OpenAI, Anthropic, Google) respect robots.txt. However, there is no legal guarantee. The robots.txt is a signal, not a technical barrier. For additional protection, you can implement server-side measures (IP blocking, rate limiting).
What does “noai” mean in the meta robots results?
noai signals to AI crawlers that the page content should not be used for training language models. noimageai applies specifically to images. These tags are voluntarily respected by some providers.
Should I block or allow AI bots?
That depends on your strategy. If you want your content to appear in AI-powered answers (e.g., in ChatGPT or Perplexity), keep the search bots allowed. If you don’t want your content used for training, block the training bots. Many website operators choose a middle ground.






