ChatGPT, Claude, Perplexity, Gemini... Identify which AI bots are crawling your site and control their access
Explore our comprehensive database of 50 AI bots. Filter by category, or behavior to find exactly what you need.
Missing a bot? Contact us to suggest new bots for our database.
Allen Institute for AI bot for academic research and model training
Amazon bot to improve Alexa and AWS AI services
Andi AI search engine bot, competitor to Perplexity
Training bot for Anthropic's Claude models, collects data to improve models
Updated Anthropic Claude bot for real-time web access and citations
Claude's web bot for exploration and indexing of web content
Bot used by Claude to fetch citations and references in real-time during conversations
Bot for training Apple AI models (Apple Intelligence)
New emerging AI bot, details on usage still limited
Bright Data analysis bot to collect data for AI
ByteDance (TikTok) bot for training their Chinese AI models
Character.AI bot for training conversational AI characters
Devin AI code assistant bot to analyze and understand online code
Cohere bot for training their language models and NLP
Cohere Command model bot for real-time information retrieval
Common Crawl bot, widely used for training open source AI models
Crawling service specialized for AI and data extraction
DeepSeek AI bot for training their advanced reasoning models and data collection
Diffbot bot for structured data extraction and creating knowledge graphs for AI
DuckDuckGo bot for their privacy-respecting AI assistant
New scraping service specialized for AI and LLMs
Google Bard AI assistant bot for web content retrieval
Google Gemini AI model bot for training and web content analysis
Bot for Gemini Deep Research in-depth searches
Token to control access to content for Gemini/Bard and Vertex AI
Groq inference engine bot for high-speed AI model data collection
Hugging Face bot for training open-source AI models and datasets
Ibou.io bot for web content indexing and analysis, particularly active on French websites
Traditional Facebook bot extended for AI and machine learning
Meta bot for training their AI models (Llama, etc.)
Mistral AI bot to retrieve citations in Le Chat
ChatGPT web browsing bot for real-time web access during conversations
Bot used for real-time searches when a user asks a question to ChatGPT
Updated version of ChatGPT-User bot for real-time searches (since February 2025)
Bot used by OpenAI to collect training data for ChatGPT and future GPT models
Specific indexing bot for ChatGPT Search, competitor to Google Search
Perplexity uses headless browsers with Chrome user agents to bypass blocking
Bot triggered when a user clicks on a link in a Perplexity response
Perplexity indexing bot to feed their AI search engine
Replicate platform bot for AI model training and data collection
RunPod cloud platform bot for GPU-based AI training data collection
Bot for reverse image search and training image generation models
Timpi bot for training their Large Language Models
Together AI platform bot for decentralized AI model training
Chinese AI bot, origin and exact usage unknown
Another Chinese AI bot, possibly linked to Pangu models
Japanese AI bot, specific usage unknown
Webz.io bot that collects data to sell to AI companies for training
Elon Musk's xAI bot for training Grok and other AI models
You.com AI search engine bot for indexing and answering questions
Try adjusting your filters or search terms
Generate custom robots.txt rules to control AI bot access to your website
π‘ How to use:
Share it with your network and help others control AI bot access!
AI user agents are specialized web crawlers used by artificial intelligence companies to collect data for training their models or providing real-time information to users.
Crawl websites to gather text data for AI model training
Fetch current information when users ask questions
Build searchable databases for AI-powered answers
AI crawlers identify themselves through user-agent strings. Keeping those strings current in your robots.txt lets you guide how language models interact with your work.
Most LLM-based AI search engines crawlers rely on a user-agent string; a short bit of text that tells your server "who" is making the request. When you spot GPTBot, ClaudeBot, PerplexityBot, or any of the newer strings below in your server access logs, you know an AI model is indexing, scraping, or quoting your page.
A bot that copies public web pages so a large-language model can learn from them.
The string that identifies that crawler in HTTP requests. You use it in robots.txt rules.
A plain-text file at the root of your site that tells crawlers what they may fetch. Add one line per User-agent you want to allow or block.
Server logs show AI search bots now account for a growing share of referral visits. Understanding which agents they use helps you encourage that traffic responsibly.
Research shows:
ChatGPT sends 1.4 visits per unique visitor to external domains. Google Search sends only 0.6.
Everything you need to know about AI crawlers and robots.txt
Any bot that requests your pages for model training or instant answers. You tell it what to do with User-agent: lines in your robots.txt file.
No. A wildcard line should be a catchβall. Still list named AI crawlers you care about; some ignore the star (*) directive and only respond to their specific user agent.
Common Crawl (CCBot) is still the leader because it releases monthly snapshots anyone can download. It's transparent and provides public access to its data.
The tokens in this guide account for 95% of AI crawler traffic according to log data we have access to. These are the most commonly seen AI bots in server logs.
Nope! Most do though. Anthropic was criticized in 2024 for ignoring robots.txt directives, and Perplexity has been known to bypass these rules.
Important: Think of a robots.txt file as a list of preferences or suggestions on how to access a website. Block bad actors at the firewall/server level or add password authentication to content you don't want bots to access.
Check your server access logs for the user-agent strings listed above. Most web analytics tools and server log analyzers can show you bot traffic patterns. Look for patterns like "GPTBot", "ClaudeBot", "PerplexityBot" in your logs.
It depends on your content strategy:
Training bots (like GPTBot, CCBot) crawl websites to collect data for training AI models. Assistant bots (like ChatGPT-User, ClaudeBot) fetch content in real-time when users ask questions, potentially driving referral traffic to your site.
Review your robots.txt monthly, as new AI bots emerge regularly. Subscribe to our newsletter or bookmark this page - we update it as new AI crawlers are discovered.
Now that your site is optimized for AI, keep track of performance, affiliate links, status codes and more!