Robots.txt is a plain-text file at the root of a domain (/robots.txt) that instructs search engine and AI crawlers which URLs they are allowed or disallowed to access, following the Robots Exclusion Protocol.
robots.txt in plain English
Robots.txt is the first file a crawler requests when visiting a site. It sits at the root (example.com/robots.txt) and uses a simple syntax to tell bots which paths they can crawl and which to skip. A Shopify store, for instance, ships with a default robots.txt that blocks /admin, /cart, and internal search result URLs from being crawled.
Each rule block starts with a User-agent line naming the crawler (Googlebot, GPTBot, ClaudeBot, or * for all), followed by Disallow and Allow directives listing URL paths. Crawlers fetch the file, parse the rules that apply to their user-agent string, and exclude matching URLs from their crawl queue. The file also supports Sitemap declarations and Crawl-delay hints. Robots.txt controls crawling, not indexing — a disallowed URL still appears in search results if other sites link to it.
Done well, robots.txt blocks low-value URLs (faceted navigation, cart pages, internal search, checkout, account pages) while leaving product, collection, blog, and sitemap URLs fully open. Done poorly, it blocks /wp-content, CSS, or JS files needed for rendering, accidentally disallows the entire site with a stray 'Disallow: /', or blocks AI crawlers a brand actually wants citing its content. A single misplaced slash takes a store offline from Google within days.
Ecommerce stores with faceted navigation generate thousands of parameter URLs (?color=red&size=m&sort=price). Blocking these parameter patterns in robots.txt keeps crawl budget focused on canonical product pages. Sites with over 10,000 SKUs see the biggest impact, since Googlebot allocates a finite number of crawl requests per day per domain.
Why robots.txt matters for ecommerce
For ecommerce operators, robots.txt directly controls crawl budget — the finite attention Googlebot gives a domain each day. A store with 50,000 product variants and uncontrolled faceted URLs burns that budget on duplicate parameter pages while new products sit undiscovered for weeks. Configure robots.txt correctly and Google spends its crawls on revenue-driving URLs. Get it wrong by blocking /products or rendering assets, and organic traffic collapses. Robots.txt also decides whether AI search engines like ChatGPT and Perplexity can read product pages and cite the store in answers — a growing source of high-intent traffic that is invisible to operators who block GPTBot, ClaudeBot, or PerplexityBot by default.