GPTBot is OpenAI's web crawler that fetches and indexes public web pages to train ChatGPT models and surface real-time results in ChatGPT search. It identifies itself with the user-agent string 'GPTBot' and respects robots.txt directives.
GPTBot in plain English
GPTBot is the automated crawler OpenAI uses to collect web content for training ChatGPT and for powering its live search responses. When a shopper asks ChatGPT 'what's the best merino wool base layer under $100', the answer draws from pages GPTBot previously fetched โ including product pages, buying guides, and review content from ecommerce sites it was allowed to access.
The bot operates like other major search crawlers. It sends HTTP requests from documented IP ranges with the user-agent 'GPTBot' (or 'OAI-SearchBot' for the search-specific variant and 'ChatGPT-User' for on-demand fetches triggered by user prompts). Before crawling, it checks the site's robots.txt file at the root domain. Site owners control access by adding 'User-agent: GPTBot' followed by 'Allow:' or 'Disallow:' rules. Blocked pages are excluded from training data and, in the case of OAI-SearchBot, from ChatGPT's search index.
A site handling GPTBot well serves clean, fast-loading HTML with structured product data, descriptive titles, and crawlable category and product URLs โ the same fundamentals that win on Google. A site handling it poorly hides content behind JavaScript that the crawler does not execute fully, blocks GPTBot in robots.txt by default, or serves bloated pages that time out. The first store gets cited in ChatGPT answers; the second is invisible.
OpenAI publishes the current GPTBot IP ranges in a JSON file at openai.com/gptbot.json, which can be used to verify legitimate traffic and separate it from spoofed user agents in server logs.
Why gptbot matters for ecommerce
ChatGPT now drives product discovery for millions of buyers who never touch Google. When a shopper asks ChatGPT to recommend a stand mixer, a running shoe, or a skincare brand, the model pulls from pages GPTBot was permitted to crawl. Stores that block GPTBot in robots.txt โ sometimes by default through Cloudflare's bot-blocking settings or a CDN preset โ are excluded from those recommendations entirely. Stores that allow GPTBot, publish detailed product content, and maintain clean technical SEO get named in answers, linked in citations, and pulled into comparison tables. The decision is binary: be in the answer set or not.