(almost) Complete AI User Agents List 2025

ChatGPT, Claude, Perplexity, Gemini... Identify which AI bots are crawling your site and control their access

πŸ” Browse AI Bots

Explore our comprehensive database of 50 AI bots. Filter by category, or behavior to find exactly what you need.

Missing a bot? Contact us to suggest new bots for our database.

CATEGORIES Filter by Purpose

BEHAVIOR Filter by Bot Behavior

Showing 50 AI bots

AI2Bot

Allen Institute for AI AI Training

Allen Institute for AI bot for academic research and model training

Mozilla/5.0 (compatible; AI2Bot/1.0; +https://allenai.org/)
AI2Bot
#ai2 #academic #research #training
βœ… Respectful

Amazonbot

Amazon AI Assistant

Amazon bot to improve Alexa and AWS AI services

Amazonbot/0.1 (+https://developer.amazon.com/support/amazonbot)
Amazonbot
#amazon #alexa #aws #assistant
βœ… Respectful

Andibot

Andi AI Search

Andi AI search engine bot, competitor to Perplexity

Mozilla/5.0 (compatible; Andibot/1.0)
Andibot
#andi #search #answer-engine #competitor
βœ… Respectful

anthropic-ai

Anthropic AI Training

Training bot for Anthropic's Claude models, collects data to improve models

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Claude-Web/1.0; +https://www.anthropic.com)
anthropic-ai
#claude #anthropic #training #bulk-data
βœ… Respectful

Anthropic-Claude

Anthropic AI Assistant

Updated Anthropic Claude bot for real-time web access and citations

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Anthropic-Claude/1.0; +https://www.anthropic.com)
Anthropic-Claude
#anthropic #claude #realtime #citations
βœ… Respectful

Claude-Web

Anthropic AI Search

Claude's web bot for exploration and indexing of web content

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Claude-Web/1.0; +https://www.anthropic.com)
claude-web
#claude #anthropic #web #crawling
βœ… Respectful

ClaudeBot

Anthropic AI Assistant

Bot used by Claude to fetch citations and references in real-time during conversations

ClaudeBot/1.0; +https://www.anthropic.com
ClaudeBot
#claude #anthropic #citations #assistant
βœ… Respectful

Applebot-Extended

Apple AI Training

Bot for training Apple AI models (Apple Intelligence)

Mozilla/5.0 (compatible; Applebot-Extended/1.0)
Applebot-Extended
#apple #apple-intelligence #training #siri
βœ… Respectful

bigsur.ai

BigSur AI AI Training

New emerging AI bot, details on usage still limited

Mozilla/5.0 (compatible; bigsur.ai/1.0)
bigsur.ai
#bigsur #emerging #new #training
βœ… Respectful

Brightbot

Bright Data AI Training

Bright Data analysis bot to collect data for AI

Mozilla/5.0 (compatible; Brightbot/1.0)
Brightbot
#bright-data #analysis #data-collection #training
βœ… Respectful

Bytespider

ByteDance AI Training

ByteDance (TikTok) bot for training their Chinese AI models

Mozilla/5.0 (compatible; Bytespider; [email protected])
Bytespider
#bytedance #tiktok #chinese #training
βœ… Respectful

Character-AI

Character.AI AI Assistant

Character.AI bot for training conversational AI characters

Mozilla/5.0 (compatible; Character-AI/1.0; +https://character.ai/)
Character-AI
#character-ai #conversational #characters #training
βœ… Respectful

Devin

Cognition AI AI Assistant

Devin AI code assistant bot to analyze and understand online code

Mozilla/5.0 (compatible; Devin/1.0)
Devin
#devin #code-assistant #programming #cognition-ai
βœ… Respectful

Cohere-Ai

Cohere AI Training

Cohere bot for training their language models and NLP

Mozilla/5.0 (compatible; Cohere-AI/1.0; +https://cohere.com/)
Cohere-Ai
#cohere #nlp #training #enterprise
βœ… Respectful

Cohere-Command

Cohere AI Assistant

Cohere Command model bot for real-time information retrieval

Mozilla/5.0 (compatible; Cohere-Command/1.0; +https://cohere.com/)
Cohere-Command
#cohere #command #assistant #enterprise
βœ… Respectful

CCBot

Common Crawl AI Training

Common Crawl bot, widely used for training open source AI models

CCBot/2.0 (https://commoncrawl.org/faq/)
CCBot
#common-crawl #open-data #training #dataset
βœ… Respectful

Crawlspace

Crawlspace AI Training

Crawling service specialized for AI and data extraction

Mozilla/5.0 (compatible; Crawlspace/1.0)
Crawlspace
#crawling-service #data-extraction #ai #training
βœ… Respectful

DeepseekBot

DeepSeek AI Training

DeepSeek AI bot for training their advanced reasoning models and data collection

Mozilla/5.0 (compatible; DeepseekBot/1.0; +https://www.deepseek.com/bot)
DeepseekBot
#deepseek #reasoning #training #chinese
βœ… Respectful

Diffbot

Diffbot AI Training

Diffbot bot for structured data extraction and creating knowledge graphs for AI

Mozilla/5.0 (compatible; Diffbot/0.1; +http://www.diffbot.com/our-apis/crawler/)
Diffbot
#diffbot #knowledge-graph #extraction #structured-data
βœ… Respectful

DuckAssistBot

DuckDuckGo AI Assistant

DuckDuckGo bot for their privacy-respecting AI assistant

Mozilla/5.0 (compatible; DuckAssistBot/1.0; +https://duckduckgo.com/duckassist)
DuckAssistBot
#duckduckgo #privacy #assistant #search
βœ… Respectful

FirecrawlAgent

Firecrawl AI Training

New scraping service specialized for AI and LLMs

Mozilla/5.0 (compatible; FirecrawlAgent/1.0)
FirecrawlAgent
#firecrawl #scraping #llm #training
βœ… Respectful

Bard-Ai

Google AI Assistant

Google Bard AI assistant bot for web content retrieval

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Bard-AI/1.0; +https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers)
Bard-Ai
#google #bard #assistant #search
βœ… Respectful

Gemini-Ai

Google AI Assistant

Google Gemini AI model bot for training and web content analysis

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Gemini-AI/1.0; +https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers)
Gemini-Ai
#google #gemini #training #analysis
βœ… Respectful

Gemini-Deep-Research

Google AI Assistant

Bot for Gemini Deep Research in-depth searches

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Gemini-Deep-Research/1.0)
Gemini-Deep-Research
#google #gemini #deep-research #assistant
βœ… Respectful

Google-Extended

Google AI Training

Token to control access to content for Gemini/Bard and Vertex AI

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Google-Extended/1.0; +https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers)
Google-Extended
#google #gemini #bard #vertex-ai
βœ… Respectful

Groq-Bot

Groq AI Training

Groq inference engine bot for high-speed AI model data collection

Mozilla/5.0 (compatible; Groq-Bot/1.0; +https://groq.com/)
Groq-Bot
#groq #inference #high-speed #training
βœ… Respectful

HuggingFace-Bot

Hugging Face AI Training

Hugging Face bot for training open-source AI models and datasets

Mozilla/5.0 (compatible; HuggingFace-Bot/1.0; +https://huggingface.co/)
HuggingFace-Bot
#huggingface #open-source #training #datasets
βœ… Respectful

IbouBot

Ibou.io AI Training

Ibou.io bot for web content indexing and analysis, particularly active on French websites

Mozilla/5.0 (compatible; IbouBot/1.0; [email protected]; +https://ibou.io/iboubot.html)
IbouBot
217.113.196.0/24
#ibou #french #crawling #indexation
βœ… Respectful

FacebookBot

Meta AI Training

Traditional Facebook bot extended for AI and machine learning

facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
FacebookBot
#meta #facebook #social #ai
βœ… Respectful

Meta-ExternalAgent

Meta AI Training

Meta bot for training their AI models (Llama, etc.)

Meta-ExternalAgent/1.0 (+https://developers.facebook.com/docs/sharing/bot)
Meta-ExternalAgent
#meta #facebook #llama #training
βœ… Respectful

MistralAI-User

Mistral AI AI Assistant

Mistral AI bot to retrieve citations in Le Chat

MistralAI-User/1.0
MistralAI-User
#mistral #le-chat #french #citations
βœ… Respectful

ChatGPT-Browser

OpenAI AI Assistant

ChatGPT web browsing bot for real-time web access during conversations

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ChatGPT-Browser/1.0; +https://openai.com/bot)
ChatGPT-Browser
#openai #chatgpt #browsing #realtime
βœ… Respectful

ChatGPT-User

OpenAI AI Assistant

Bot used for real-time searches when a user asks a question to ChatGPT

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ChatGPT-User/1.0; +https://openai.com/bot)
ChatGPT-User
#chatgpt #realtime #search #user-triggered
βœ… Respectful

ChatGPT-User v2.0

OpenAI AI Assistant

Updated version of ChatGPT-User bot for real-time searches (since February 2025)

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ChatGPT-User/2.0; +https://openai.com/bot)
ChatGPT-User-v2
#chatgpt #realtime #search #user-triggered #v2
βœ… Respectful

GPTBot

OpenAI AI Training

Bot used by OpenAI to collect training data for ChatGPT and future GPT models

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)
GPTBot
#chatgpt #training #openai #gpt
βœ… Respectful

OAI-SearchBot

OpenAI AI Search

Specific indexing bot for ChatGPT Search, competitor to Google Search

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot)
OAI-SearchBot
#openai #search #indexation #chatgpt-search
βœ… Respectful

Perplexity Stealth

Perplexity AI AI Assistant
⚠️ STEALTH

Perplexity uses headless browsers with Chrome user agents to bypass blocking

Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36
Perplexity-Stealth
#perplexity #stealth #headless #chrome
❌ Ignores robots.txt

Perplexity-User

Perplexity AI AI Assistant

Bot triggered when a user clicks on a link in a Perplexity response

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Perplexity-User/1.0; +https://perplexity.ai/bot)
Perplexity-User
#perplexity #user-triggered #realtime

PerplexityBot

Perplexity AI AI Search

Perplexity indexing bot to feed their AI search engine

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/bot)
PerplexityBot
#perplexity #search #answer-engine #indexation

Replicate-Bot

Replicate AI Training

Replicate platform bot for AI model training and data collection

Mozilla/5.0 (compatible; Replicate-Bot/1.0; +https://replicate.com/)
Replicate-Bot
#replicate #platform #training #models
βœ… Respectful

RunPod-Bot

RunPod AI Training

RunPod cloud platform bot for GPU-based AI training data collection

Mozilla/5.0 (compatible; RunPod-Bot/1.0; +https://runpod.io/)
RunPod-Bot
#runpod #gpu #cloud #training
βœ… Respectful

ImagesiftBot

The Hive AI Training

Bot for reverse image search and training image generation models

Mozilla/5.0 (compatible; ImagesiftBot/1.0)
ImagesiftBot
#image-search #reverse-search #image-generation #training
βœ… Respectful

TimpiBot

Timpi AI Training

Timpi bot for training their Large Language Models

Mozilla/5.0 (compatible; TimpiBot/1.0)
TimpiBot
#timpi #llm #training #search
βœ… Respectful

Together-Bot

Together AI AI Training

Together AI platform bot for decentralized AI model training

Mozilla/5.0 (compatible; Together-Bot/1.0; +https://together.ai/)
Together-Bot
#together-ai #decentralized #training #platform
βœ… Respectful

Kangaroo Bot

Unknown (China) AI Training

Chinese AI bot, origin and exact usage unknown

Mozilla/5.0 (compatible; Kangaroo Bot/1.0)
Kangaroo Bot
#chinese #unknown #training #suspicious
❌ Ignores robots.txt

PanguBot

Unknown (China) AI Training

Another Chinese AI bot, possibly linked to Pangu models

Mozilla/5.0 (compatible; PanguBot/1.0)
PanguBot
#chinese #pangu #training #unknown
❌ Ignores robots.txt

Cotoyogi

Unknown (Japan) AI Training

Japanese AI bot, specific usage unknown

Mozilla/5.0 (compatible; Cotoyogi/1.0)
Cotoyogi
#japanese #unknown #training #asia
βœ… Respectful

Webzio-Extended

Webz.io AI Training

Webz.io bot that collects data to sell to AI companies for training

Mozilla/5.0 (compatible; Webzio-Extended/1.0)
Webzio-Extended
#webzio #data-broker #training #commercial
βœ… Respectful

xAI-Bot

xAI AI Training

Elon Musk's xAI bot for training Grok and other AI models

Mozilla/5.0 (compatible; xAI-Bot/1.0; +https://x.ai/)
xAI-Bot
#xai #grok #elon-musk #training
βœ… Respectful

YouBot

You.com AI Search

You.com AI search engine bot for indexing and answering questions

Mozilla/5.0 (compatible; YouBot/1.0; +https://you.com/bot)
YouBot
#you-com #search #answer-engine #ai
βœ… Respectful

AI robots.txt Generator

Generate custom robots.txt rules to control AI bot access to your website

βœ… Allow These Bots:

❌ Block These Bots:

πŸ“ Generated robots.txt:

πŸ’‘ How to use:

  1. Copy the generated content above
  2. Save it as "robots.txt" in your website's root directory
  3. Test it at yoursite.com/robots.txt

Love this free tool?

Share it with your network and help others control AI bot access!

What are AI User Agents?

AI user agents are specialized web crawlers used by artificial intelligence companies to collect data for training their models or providing real-time information to users.

πŸ€–

Training Data Collection

Crawl websites to gather text data for AI model training

πŸ”

Real-time Research

Fetch current information when users ask questions

πŸ“š

Knowledge Indexing

Build searchable databases for AI-powered answers

Popular AI Bots by Usage

GPTBot (OpenAI)
Training & Real-time
ClaudeBot (Anthropic)
Citations
PerplexityBot
Search Engine
Google-Extended
Gemini/Bard

πŸ”‘ Key Takeaway

AI crawlers identify themselves through user-agent strings. Keeping those strings current in your robots.txt lets you guide how language models interact with your work.

Most LLM-based AI search engines crawlers rely on a user-agent string; a short bit of text that tells your server "who" is making the request. When you spot GPTBot, ClaudeBot, PerplexityBot, or any of the newer strings below in your server access logs, you know an AI model is indexing, scraping, or quoting your page.

πŸ“– Quick Definitions

AI crawler

A bot that copies public web pages so a large-language model can learn from them.

AI user-agent

The string that identifies that crawler in HTTP requests. You use it in robots.txt rules.

Robots.txt

A plain-text file at the root of your site that tells crawlers what they may fetch. Add one line per User-agent you want to allow or block.

πŸ’‘ Why You Should Care

Server logs show AI search bots now account for a growing share of referral visits. Understanding which agents they use helps you encourage that traffic responsibly.

  • β€’ AI search bots (ChatGPT, Claude, Bing Copilot, and Perplexity) send measurable referral traffic to websites.
  • β€’ Clear robots.txt rules let helpful agents in and keep abusive scrapers out.
  • β€’ If you have access to server log files, you can see how often AI/LLM bots are hitting your website.

Research shows:

ChatGPT sends 1.4 visits per unique visitor to external domains. Google Search sends only 0.6.

❓ Frequently Asked Questions

Everything you need to know about AI crawlers and robots.txt

What is an AI crawler in robots.txt?

Any bot that requests your pages for model training or instant answers. You tell it what to do with User-agent: lines in your robots.txt file.

Is User-agent: * enough?

No. A wildcard line should be a catch‑all. Still list named AI crawlers you care about; some ignore the star (*) directive and only respond to their specific user agent.

What is the best AI web crawler for open data?

Common Crawl (CCBot) is still the leader because it releases monthly snapshots anyone can download. It's transparent and provides public access to its data.

What do you mean by "top user agents"?

The tokens in this guide account for 95% of AI crawler traffic according to log data we have access to. These are the most commonly seen AI bots in server logs.

Are bots required to follow directives in robots.txt files?

Nope! Most do though. Anthropic was criticized in 2024 for ignoring robots.txt directives, and Perplexity has been known to bypass these rules.

Important: Think of a robots.txt file as a list of preferences or suggestions on how to access a website. Block bad actors at the firewall/server level or add password authentication to content you don't want bots to access.

How do I know which AI bots are visiting my website?

Check your server access logs for the user-agent strings listed above. Most web analytics tools and server log analyzers can show you bot traffic patterns. Look for patterns like "GPTBot", "ClaudeBot", "PerplexityBot" in your logs.

Should I block or allow AI crawlers?

It depends on your content strategy:

  • Allow if you want your content to appear in AI search results and get referral traffic
  • Block if you're concerned about content being used for training without compensation
  • Selective approach: Allow assistant bots (ChatGPT-User, ClaudeBot) but block training bots (GPTBot, CCBot)

What's the difference between training bots and assistant bots?

Training bots (like GPTBot, CCBot) crawl websites to collect data for training AI models. Assistant bots (like ChatGPT-User, ClaudeBot) fetch content in real-time when users ask questions, potentially driving referral traffic to your site.

How often should I update my robots.txt for AI bots?

Review your robots.txt monthly, as new AI bots emerge regularly. Subscribe to our newsletter or bookmark this page - we update it as new AI crawlers are discovered.

πŸ“Š Monitor your website's health

Now that your site is optimized for AI, keep track of performance, affiliate links, status codes and more!