AI Bot Checker: Which AI Bots Have Access to Your Website?

Q: robots.txt vs. Meta Robots: What is the Difference?

The robots.txt blocks crawling at the directory level. Meta robots tags like noai signal at the page level that the content should not be used for AI training. Both mechanisms complement each other but are not legally binding – they rely on voluntary compliance by AI providers.

Q: What does "noai" mean in the meta robots results?

noai signals to AI crawlers that the page content should not be used for training language models. noimageai applies specifically to images. These tags are voluntarily respected by some providers.

GPTBot, ClaudeBot, Bytespider – AI systems crawl millions of websites daily to train their models or answer user queries. The question is: Have you consciously allowed this access, or is it happening without your knowledge?

This tool checks your robots.txt against 30 known AI bots and additionally detects whether your website signals an opt-out via meta robots tags (noai, noimageai). The result: a clear overview with charts showing which AI providers can access your content – and which cannot.

AI Bot Access Checker

Domain

Enter a domain to fetch and analyze its robots.txt for AI bot access.

URL Path to Test

Path to test (e.g., /, /blog/, /api/). Defaults to /.

Custom User-Agent (Optional)

Test a custom user-agent string alongside known AI bots.

Robots.txt Content

Paste robots.txt content to test AI bot access.

URL Path to Test

Path to test (e.g., /, /blog/, /api/). Defaults to /.

Custom User-Agent (Optional)

Test a custom user-agent string alongside known AI bots.

How the AI Bot Checker Works

Enter your domain: Enter your domain. The tool automatically loads your robots.txt and scans your homepage for meta robots tags.
Test custom rules: Alternatively, paste robots.txt content directly to test changes before deploying them.
Review results: For each of the 30 AI bots, you will see the status (allowed/blocked), user agent string, description, and category (search, training, image recognition).
Analyze the chart: The donut chart and vendor overview show the ratio of allowed to blocked bots at a glance – broken down by provider (OpenAI, Anthropic, Google, Meta, Apple, and more).
Check meta robots signals: The tool detects noai and noimageai meta tags as well as X-Robots-Tag headers on your homepage.

Changelog

Expanded bot database from 16 to 30 known AI bots (including Apple, Brave, Perplexity, Meta)
Added donut chart and per-vendor stacked bar chart for visual analysis
Implemented meta robots detection: scans <meta name="robots">, bot-specific meta tags, and X-Robots-Tag HTTP headers for noai/noimageai
Added bot categorization by purpose (search, training, image recognition)
Custom user agent input for individual bot testing

Background: Why You Should Control AI Bot Access

AI companies use web crawlers to collect training data or provide real-time information. This typically happens without explicit consent from website operators. The robots.txt is currently the primary mechanism to control this access.

Two Types of AI Bots

Search and answer bots: These bots (e.g., ChatGPT-User, PerplexityBot) fetch content to answer user queries in real time. Blocking them means your content will not appear in AI-powered search results.
Training bots: These bots (e.g., GPTBot, ClaudeBot) collect data for training language models. This raises the fundamental question of whether your content should be used to train third-party AI systems.

Frequently Asked Questions

Frequently Asked Questions

robots.txt vs. Meta Robots: What is the Difference?

The robots.txt blocks crawling at the directory level. Meta robots tags like noai signal at the page level that the content should not be used for AI training. Both mechanisms complement each other but are not legally binding – they rely on voluntary compliance by AI providers.

Which AI bots does the tool check?

The tool checks 30 known AI bots, including GPTBot and ChatGPT-User (OpenAI), ClaudeBot (Anthropic), Bytespider (ByteDance), Google-Extended (Google), FacebookBot (Meta), Applebot-Extended (Apple), PerplexityBot, and more.

Can I block all AI bots at once?

There is no single user agent for “all AI bots.” You need to block each bot individually in your robots.txt. This tool shows you all relevant user agent strings so you can create the appropriate rules.

Do all AI bots respect robots.txt?

Most major providers (OpenAI, Anthropic, Google) respect robots.txt. However, there is no legal guarantee. The robots.txt is a signal, not a technical barrier. For additional protection, you can implement server-side measures (IP blocking, rate limiting).

What does “noai” mean in the meta robots results?

noai signals to AI crawlers that the page content should not be used for training language models. noimageai applies specifically to images. These tags are voluntarily respected by some providers.

Should I block or allow AI bots?

That depends on your strategy. If you want your content to appear in AI-powered answers (e.g., in ChatGPT or Perplexity), keep the search bots allowed. If you do not want your content used for training, block the training bots. Many website operators choose a middle ground.

AI Bot Checker: Which AI Bots Have Access to Your Website?

AI Bot Access Checker

How the AI Bot Checker Works

Changelog

Background: Why You Should Control AI Bot Access

Two Types of AI Bots

Frequently Asked Questions

Cloaking Detector: Do Crawlers See What Your Visitors See?

Social Preview: How your page looks on Facebook, X, LinkedIn & co.

Sitemap / Robots.txt Cross-Check: two sources, one truth

AI Bot Log Parser

Prompt Injection Scanner

HTTP Header Checker: Audit and Score Security Headers

Meine Tools