Skip to main content
Glossary

robots.txt

By · Updated
Quick definition

Robots.txt is a plain-text file at the root of a domain (/robots.txt) that instructs search engine and AI crawlers which URLs they are allowed or disallowed to access, following the Robots Exclusion Protocol.

robots.txt in plain English

Robots.txt is the first file a crawler requests when visiting a site. It sits at the root (example.com/robots.txt) and uses a simple syntax to tell bots which paths they can crawl and which to skip. A Shopify store, for instance, ships with a default robots.txt that blocks /admin, /cart, and internal search result URLs from being crawled.

Each rule block starts with a User-agent line naming the crawler (Googlebot, GPTBot, ClaudeBot, or * for all), followed by Disallow and Allow directives listing URL paths. Crawlers fetch the file, parse the rules that apply to their user-agent string, and exclude matching URLs from their crawl queue. The file also supports Sitemap declarations and Crawl-delay hints. Robots.txt controls crawling, not indexing — a disallowed URL still appears in search results if other sites link to it.

Done well, robots.txt blocks low-value URLs (faceted navigation, cart pages, internal search, checkout, account pages) while leaving product, collection, blog, and sitemap URLs fully open. Done poorly, it blocks /wp-content, CSS, or JS files needed for rendering, accidentally disallows the entire site with a stray 'Disallow: /', or blocks AI crawlers a brand actually wants citing its content. A single misplaced slash takes a store offline from Google within days.

Ecommerce stores with faceted navigation generate thousands of parameter URLs (?color=red&size=m&sort=price). Blocking these parameter patterns in robots.txt keeps crawl budget focused on canonical product pages. Sites with over 10,000 SKUs see the biggest impact, since Googlebot allocates a finite number of crawl requests per day per domain.

Why robots.txt matters for ecommerce

For ecommerce operators, robots.txt directly controls crawl budget — the finite attention Googlebot gives a domain each day. A store with 50,000 product variants and uncontrolled faceted URLs burns that budget on duplicate parameter pages while new products sit undiscovered for weeks. Configure robots.txt correctly and Google spends its crawls on revenue-driving URLs. Get it wrong by blocking /products or rendering assets, and organic traffic collapses. Robots.txt also decides whether AI search engines like ChatGPT and Perplexity can read product pages and cite the store in answers — a growing source of high-intent traffic that is invisible to operators who block GPTBot, ClaudeBot, or PerplexityBot by default.

Deeper dives on this term

Focused pages that go deeper than the definition — comparisons, platform-specific guides, operational walkthroughs.

Compare

robots.txt vs Canonical URL: What's the Difference?

robots.txt blocks crawler access; canonical URLs guide duplicate content signals. Learn when to use each, how they interact, and w

Read →
Compare

robots.txt vs Crawl Budget: What's the Difference?

robots.txt blocks crawlers from pages. Crawl budget controls how many pages Google crawls in a given period. Learn the exact diffe

Read →
Compare

robots.txt vs GPTBot: What's the Difference?

robots.txt controls crawler access; GPTBot is OpenAI's crawler. Learn how they interact, where they overlap, and which rules apply

Read →
Compare

robots.txt vs llms.txt: What's the Difference?

robots.txt blocks crawlers from fetching pages. llms.txt guides AI models on what content to use. Learn when each applies and how

Read →
Compare

robots.txt vs Sitemap.xml: What's the Difference?

robots.txt tells crawlers what to skip; sitemap.xml tells them what to index. Learn the exact difference and when each file does t

Read →
Platform

robots.txt for Shopify Stores

How robots.txt works on Shopify stores, including platform limits, editable sections, Liquid customization, and SEO implications f

Read →
Platform

robots.txt for Wix Stores

How robots.txt works on Wix stores: platform limits, the built-in SEO settings, workarounds, and what ecommerce operators must con

Read →
Platform

robots.txt for WooCommerce Stores

How to configure robots.txt for WooCommerce stores: platform-specific paths, plugin tools, and crawl-budget fixes for 6-8 figure s

Read →
How-to

How to implement robots.txt for an Ecommerce Store

A step-by-step guide to implementing robots.txt for ecommerce stores—block crawl waste, protect checkout pages, and direct Googleb

Read →
Checklist

robots.txt Checklist: 12 Items Every Ecommerce Store Should Audit

Audit your ecommerce robots.txt with this 12-item checklist. Each item includes a specific pass/fail criterion to protect crawl bu

Read →

Frequently asked questions

What is robots.txt?

Robots.txt is a plain-text file at the root of a website that tells web crawlers which URLs they are allowed to access. It follows the Robots Exclusion Protocol and is the first file most bots request when visiting a domain. The file uses User-agent, Disallow, and Allow directives to control crawler behavior across search engines and AI training bots.

How large can a robots.txt file be?

Google enforces a 500 kibibyte (KiB) limit on robots.txt files. Content beyond that size is ignored. Most ecommerce robots.txt files stay under 10 KB even with extensive parameter blocking. If a file approaches the limit, the fix is consolidating wildcard patterns rather than listing individual URLs, since one wildcard rule replaces hundreds of explicit Disallow lines.

What is the difference between robots.txt and the noindex meta tag?

Robots.txt controls crawling — whether a bot fetches a URL at all. The noindex meta tag controls indexing — whether a fetched page appears in search results. A page blocked by robots.txt can still be indexed if external sites link to it, because Google sees the URL without reading the page. To remove a URL from search results entirely, use noindex and leave the URL crawlable so Google reads the directive.

How do I set up robots.txt for a Shopify or WooCommerce store?

Shopify generates a default robots.txt automatically and allows edits through the robots.txt.liquid template file in the theme code. WooCommerce sites edit robots.txt through an SEO plugin like Yoast or RankMath, or by uploading a physical robots.txt file to the WordPress root directory. Standard ecommerce rules block /cart, /checkout, /account, internal search, and faceted parameter URLs while allowing /products and /collections.

Does robots.txt actually matter for SEO?

Yes. Robots.txt determines how search engines allocate crawl budget across a domain. Stores with thousands of URLs need it to prevent Googlebot from wasting requests on duplicate parameter pages, internal search results, and checkout flows. It also governs access for AI crawlers like GPTBot and ClaudeBot, which decide whether a brand gets cited in ChatGPT and Claude answers. Misconfigured robots.txt files cause traffic losses within days.

MG
Written by

Matt is the founder of RunOctopus. He built All Angles Creatures from zero to page-1 rankings in reptile feeder insects in under 60 days using exactly this method — turning a hard, entrenched niche into RunOctopus's proof store for programmatic SEO and AI search citation.

Connect on LinkedIn →

See what Otto would build for your store

Free architecture preview. No card required. Five minutes.

Generate Preview →