Skip to main content
WooCommerce guide

robots.txt for WooCommerce Stores

By ยท Updated ยท 7 min read

How WooCommerce Generates and Serves robots.txt

WooCommerce runs on WordPress, which since version 5.3 generates a virtual robots.txt file dynamically rather than reading a physical file from the server root. WordPress assembles this file at runtime using the wp_robots API and filters, meaning there is no static robots.txt file on disk by default. WooCommerce itself adds no entries to this virtual file out of the box โ€” the base WordPress rules apply.

The virtual file is accessible at yourdomain.com/robots.txt and is served with a 200 status. If a physical robots.txt file exists in the server root (uploaded via FTP or cPanel), WordPress defers to that file entirely and the virtual generation is bypassed. This distinction matters: many store owners edit the wrong file or expect plugin changes to appear when a physical file is overriding everything.

WooCommerce-Specific Paths That Require Disallow Rules

WooCommerce creates several URL patterns that carry no indexable value and consume crawl budget. The cart page (/cart/), checkout page (/checkout/), and account pages (/my-account/ and its sub-paths like /my-account/orders/ and /my-account/edit-account/) should be blocked from crawlers. These pages render user-session-specific content and return thin or duplicate output for any unauthenticated bot.

Query strings are the larger issue. WooCommerce appends parameters like ?add-to-cart=, ?variation_id=, and ?wc-ajax= to product URLs, creating thousands of crawlable URL variants that duplicate the canonical product page. Adding Disallow: /*?add-to-cart= and Disallow: /*?wc-ajax= blocks these variants without affecting organic product page indexing.

WooCommerce also creates REST API endpoints under /wp-json/wc/ and admin-facing paths under /wp-admin/. The WordPress virtual robots.txt already blocks /wp-admin/ with a Disallow for all bots. The WooCommerce REST API at /wp-json/ is blocked by the default WordPress rule Disallow: /wp-json/ for non-AdsBot crawlers, but verify this is present in your active robots.txt output before assuming coverage.

Plugins That Manage robots.txt in WooCommerce Environments

Yoast SEO and Rank Math are the two dominant plugins that expose a robots.txt editor inside the WordPress admin at SEO > Tools > File Editor (Yoast) or Rank Math > General Settings > Edit robots.txt. Both plugins write their changes to a physical robots.txt file when one does not exist, or let you override the virtual file. Both surface the current active robots.txt output in the editor, removing the guesswork about whether a physical or virtual file is serving.

All in One SEO (AIOSEO) provides a similar in-dashboard editor under All in One SEO > Tools > Robots.txt Editor. It also includes a syntax validator that flags malformed directives before saving. For headless WooCommerce deployments using a React or Next.js frontend, none of these WordPress plugins control the frontend robots.txt โ€” that file must be managed at the CDN or Next.js config level separately from any WordPress plugin setting.

WP Rocket, a popular caching plugin used on WooCommerce stores, does not directly edit robots.txt but adds its cache directory to the file to prevent crawlers from indexing cached HTML files. If both WP Rocket and Yoast are active, check that WP Rocket's cache path entries survive any robots.txt saves made through the Yoast editor, as overwriting the file through Yoast can strip WP Rocket's additions.

Crawl Budget Problems Specific to Large WooCommerce Catalogs

Stores with more than a few hundred SKUs generate substantial crawl noise through WooCommerce's faceted filtering. Plugins like WooCommerce Product Filters, FiboSearch, and YITH WooCommerce Ajax Product Filter create URL structures such as /shop/?filter_color=red&filter_size=medium. Each combination is a crawlable URL. Blocking these at robots.txt using Disallow: /shop/?filter_ (or equivalent parameter patterns) prevents Googlebot from spending its crawl allocation on filtered duplicates.

The /product-category/ and /product-tag/ taxonomy archives compound this problem. A store with 50 categories and 200 tags produces 250 archive pages, many with overlapping product listings. While blocking these in robots.txt is not always the right answer โ€” some category pages carry real ranking value โ€” tags frequently do not. Evaluate tag archive traffic in Google Search Console before deciding, then add Disallow: /product-tag/ if those pages show zero impressions.

Limitations and Edge Cases on the WooCommerce Platform

The WordPress multisite configuration complicates robots.txt management for WooCommerce stores running multiple storefronts under one installation. Each subsite in a subdirectory multisite (domain.com/store-uk/, domain.com/store-us/) does not get its own robots.txt. The single robots.txt at the root domain covers all subsites, so Disallow paths must account for all subsite prefixes. Subdomain multisites (uk.domain.com) each serve their own robots.txt, which simplifies per-region control.

WooCommerce Subscriptions and WooCommerce Memberships add URL paths for subscriber dashboards and member-only content. These paths are typically behind login walls and return 302 redirects for unauthenticated bots, so crawlers cannot index them regardless. However, including explicit Disallow rules for those paths is a defensive measure that prevents unnecessary redirect chains in crawl logs.

One hard platform limit: WordPress does not support per-bot Sitemap directives in its virtual robots.txt without a plugin or physical file override. If multiple XML sitemaps exist (one for products, one for posts, one generated by WooCommerce's built-in sitemap), all Sitemap: lines must be added manually through a plugin editor or physical file. The virtual file only emits a single Sitemap: line pointing to wp-sitemap.xml.

Actionable Configuration Checklist for WooCommerce Stores

Start by confirming whether a physical or virtual robots.txt is active: fetch yourdomain.com/robots.txt and compare it to what your SEO plugin shows in its editor. If they differ, a physical file exists and is overriding the plugin. Delete or update the physical file via FTP, then manage everything through the plugin editor going forward.

Add Disallow rules for /cart/, /checkout/, /my-account/, /*?add-to-cart=, /*?wc-ajax=, and any faceted filter query strings your store generates. Confirm /wp-admin/ is already blocked (it should be). Add Sitemap: lines for every XML sitemap your store produces โ€” product sitemap, category sitemap, and the WordPress core sitemap if used. Validate the final output using Google Search Console's robots.txt tester under the legacy tools section, then submit a crawl validation request to confirm Googlebot reads the updated file.

Frequently asked questions

Does WooCommerce create a robots.txt file automatically?

WooCommerce itself adds no robots.txt rules. WordPress (version 5.3 and later) generates a virtual robots.txt dynamically that blocks /wp-admin/ and a few other core paths. WooCommerce-specific paths like /cart/, /checkout/, and query parameters such as ?add-to-cart= are not blocked by default and must be added manually through an SEO plugin or a physical robots.txt file.

Which plugin is best for editing robots.txt on a WooCommerce store?

Yoast SEO and Rank Math are the most widely used options, both providing in-dashboard editors that write to a physical robots.txt file. Rank Math includes a syntax validator. All in One SEO (AIOSEO) is a capable third option. The choice between them is largely determined by which plugin already handles the store's other on-page SEO needs, since installing a second SEO plugin to manage robots.txt alone causes conflicts.

Should the /shop/ page be blocked in robots.txt for WooCommerce?

No. The /shop/ page is the main catalog entry point and typically earns organic search traffic. Blocking it wastes ranking potential. What should be blocked are filtered variants of /shop/ generated by query strings like ?filter_color= or ?orderby=, which duplicate content without adding indexable value. Block the parameter patterns, not the base /shop/ URL.

How does a WooCommerce multisite installation affect robots.txt?

In a subdirectory multisite (domain.com/store-a/), all subsites share one robots.txt at the root domain. Disallow rules must include path prefixes for every subsite. In a subdomain multisite (storea.domain.com), each subdomain serves its own robots.txt independently, allowing per-subsite configuration. Most WooCommerce multisites use subdirectories, so one robots.txt file must cover all storefronts simultaneously.

Can robots.txt block WooCommerce's REST API endpoints from being crawled?

Yes. The WordPress default virtual robots.txt includes Disallow: /wp-json/ for most crawlers, which covers WooCommerce's REST API at /wp-json/wc/. Verify this rule is present in your active robots.txt output, as some SEO plugin configurations or physical file overrides remove it. Blocking the REST API from crawlers prevents bots from indexing raw JSON responses that appear as thin or duplicate content.

MG
Written by

Matt is the founder of RunOctopus. He built All Angles Creatures from zero to page-1 rankings in reptile feeder insects in under 60 days using exactly this method โ€” turning a hard, entrenched niche into RunOctopus's proof store for programmatic SEO and AI search citation.

Connect on LinkedIn →

See what Otto would build for your store

Free architecture preview. No card required. Five minutes.

Generate Preview →