robots.txt: Definition & Why It Matters for Ecommerce SEO

Quick definition

Robots.txt is a plain-text file at the root of a domain (/robots.txt) that instructs search engine and AI crawlers which URLs they are allowed or disallowed to access, following the Robots Exclusion Protocol.

robots.txt in plain English

Robots.txt is the first file a crawler requests when visiting a site. It sits at the root (example.com/robots.txt) and uses a simple syntax to tell bots which paths they can crawl and which to skip. A Shopify store, for instance, ships with a default robots.txt that blocks /admin, /cart, and internal search result URLs from being crawled.

Each rule block starts with a User-agent line naming the crawler (Googlebot, GPTBot, ClaudeBot, or * for all), followed by Disallow and Allow directives listing URL paths. Crawlers fetch the file, parse the rules that apply to their user-agent string, and exclude matching URLs from their crawl queue. The file also supports Sitemap declarations and Crawl-delay hints. Robots.txt controls crawling, not indexing. A disallowed URL still appears in search results if other sites link to it.

Done well, robots.txt blocks low-value URLs (faceted navigation, cart pages, internal search, checkout, account pages) while leaving product, collection, blog, and sitemap URLs fully open. Done poorly, it blocks /wp-content, CSS, or JS files needed for rendering, accidentally disallows the entire site with a stray 'Disallow: /', or blocks AI crawlers a brand actually wants citing its content. A single misplaced slash takes a store offline from Google within days.

Ecommerce stores with faceted navigation generate thousands of parameter URLs (?color=red&size=m&sort=price). Blocking these parameter patterns in robots.txt keeps crawl budget focused on canonical product pages. Sites with over 10,000 SKUs see the biggest impact, since Googlebot allocates a finite number of crawl requests per day per domain.

Why robots.txt matters for ecommerce

For ecommerce operators, robots.txt directly controls crawl budget. The finite attention Googlebot gives a domain each day. A store with 50,000 product variants and uncontrolled faceted URLs burns that budget on duplicate parameter pages while new products sit undiscovered for weeks. Configure robots.txt correctly and Google spends its crawls on revenue-driving URLs. Get it wrong by blocking /products or rendering assets, and organic traffic collapses. Robots.txt also decides whether AI search engines like ChatGPT and Perplexity can read product pages and cite the store in answers. A growing source of high-intent traffic that is invisible to operators who block GPTBot, ClaudeBot, or PerplexityBot by default.

Frequently asked questions

What is robots.txt?

Robots.txt is a plain-text file at the root of a website that tells web crawlers which URLs they are allowed to access. It follows the Robots Exclusion Protocol and is the first file most bots request when visiting a domain. The file uses User-agent, Disallow, and Allow directives to control crawler behavior across search engines and AI training bots.

How large can a robots.txt file be?

Google enforces a 500 kibibyte (KiB) limit on robots.txt files. Content beyond that size is ignored. Most ecommerce robots.txt files stay under 10 KB even with extensive parameter blocking. If a file approaches the limit, the fix is consolidating wildcard patterns rather than listing individual URLs, since one wildcard rule replaces hundreds of explicit Disallow lines.

What is the difference between robots.txt and the noindex meta tag?

Robots.txt controls crawling. Whether a bot fetches a URL at all. The noindex meta tag controls indexing. Whether a fetched page appears in search results. A page blocked by robots.txt can still be indexed if external sites link to it, because Google sees the URL without reading the page. To remove a URL from search results entirely, use noindex and leave the URL crawlable so Google reads the directive.

How do I set up robots.txt for a Shopify or WooCommerce store?

Shopify generates a default robots.txt automatically and allows edits through the robots.txt.liquid template file in the theme code. WooCommerce sites edit robots.txt through an SEO plugin like Yoast or RankMath, or by uploading a physical robots.txt file to the WordPress root directory. Standard ecommerce rules block /cart, /checkout, /account, internal search, and faceted parameter URLs while allowing /products and /collections.

Does robots.txt actually matter for SEO?

Yes. Robots.txt determines how search engines allocate crawl budget across a domain. Stores with thousands of URLs need it to prevent Googlebot from wasting requests on duplicate parameter pages, internal search results, and checkout flows. It also governs access for AI crawlers like GPTBot and ClaudeBot, which decide whether a brand gets cited in ChatGPT and Claude answers. Misconfigured robots.txt files cause traffic losses within days.

robots.txt

robots.txt in plain English

Why robots.txt matters for ecommerce

Deeper dives on this term

robots.txt vs Canonical URL: What's the Difference?

robots.txt vs Crawl Budget: What's the Difference?

robots.txt vs GPTBot: What's the Difference?

robots.txt vs llms.txt: What's the Difference?

robots.txt vs Sitemap.xml: What's the Difference?

robots.txt for Shopify Stores

robots.txt for Wix Stores

robots.txt for WooCommerce Stores

How to implement robots.txt for an Ecommerce Store

robots.txt Checklist: 12 Items Every Ecommerce Store Should Audit