Skip to main content
Shopify guide

Crawl Budget for Shopify Stores

By ยท Updated ยท 7 min read

Why Crawl Budget Behaves Differently on Shopify

Crawl budget is the number of URLs Googlebot crawls and indexes within a given timeframe. On Shopify, the platform's architecture creates URL patterns that consume crawl budget in ways custom-built stores do not face. The most documented example is the duplicate product URL problem: Shopify generates two canonical URLs for every product โ€” one under /products/ and one under /collections/[collection-handle]/products/. Googlebot discovers both, crawls both, and spends budget on pages that resolve to the same content.

Shopify also generates paginated collection pages (/collections/all?page=2), faceted navigation URLs through filter parameters (?sort_by=, ?filter.p.m.*), and variant-specific URLs (?variant=123456789). On a store with 5,000 SKUs and active filtering, the crawlable URL space can reach tens of thousands of addresses, most of which carry no unique indexable value. Stores earning eight figures with deep catalogs feel this pressure most acutely.

Shopify's Built-In Crawl Budget Drains

Shopify's /collections/all page is auto-generated and lists every product in the store. On large catalogs, this page paginates into hundreds of sub-pages, all discoverable through internal links and sitemaps. Because /collections/all is rarely a high-converting landing page, the crawl budget it consumes returns minimal SEO value. The canonical tag on these paginated pages points to themselves, not to the root collection, which means Googlebot indexes each one rather than consolidating signals.

Theme-generated duplicate navigation paths compound this. Many Shopify themes create breadcrumb and menu links that produce the collection-scoped product URL (/collections/[handle]/products/[handle]) alongside the canonical /products/[handle]. Both get crawled. Shopify's hosted sitemap.xml automatically includes all published products and collections, but it does not distinguish between high-value and low-value URLs โ€” every published page appears with equal weight.

Apps installed on the store add further pressure. Review apps, loyalty programs, and page-builder tools routinely create their own routes or embed JavaScript that Googlebot's renderer encounters and follows. A store running 15 to 20 apps can add hundreds of app-proxy URLs (e.g., /apps/[app-name]/...) that consume crawl budget without contributing to rankings.

What Shopify Restricts You From Doing

Unlike a self-hosted platform, Shopify does not allow direct editing of the robots.txt file through the theme editor on standard plans. Since late 2021, Shopify exposes a robots.txt.liquid template that merchants on all plans can customize, but only through the theme code editor. This template controls Disallow directives and crawl-delay settings for specific bots. Without editing this file, Shopify's default robots.txt blocks a limited set of paths but leaves the full /collections/ tree and filter parameters open to crawlers.

Shopify does not support server-side 301 redirect logic for URL parameter normalization the way Apache or Nginx configurations allow. Canonicals are the primary tool available. Shopify automatically sets the canonical tag on collection-scoped product URLs to the /products/ URL, which is the correct behavior, but Googlebot still crawls the duplicate before reading the canonical. On very large stores, this is a crawl budget reality that canonicals alone do not fully resolve.

Tools and Techniques Specific to Shopify

The robots.txt.liquid template is the highest-leverage tool available. Adding Disallow directives for /collections/*?sort_by=, /collections/*?filter*, and /search? blocks Googlebot from crawling the long tail of filtered and sorted pages. These URLs rarely deserve independent indexing; the canonical collection page carries the value. Editing robots.txt.liquid requires navigating to Online Store > Themes > Actions > Edit Code in the Shopify admin.

Google Search Console's URL Inspection tool and the Coverage report are essential for auditing which Shopify-generated URLs Google has crawled and indexed. The Crawl Stats report (found under Settings in Search Console) shows daily crawl volume and response codes. A spike in 4xx responses often traces back to deleted product variants whose URLs were cached in Googlebot's queue โ€” a common Shopify pattern when inventory is cleared seasonally.

Third-party crawl tools like Screaming Frog or Sitebulb, when pointed at a Shopify store, reveal the full scope of duplicate URL generation. Running a crawl with JavaScript rendering enabled exposes app-proxy routes and dynamically injected links that a standard HTML crawl misses. Comparing the crawl output against the sitemap.xml shows which URLs Shopify is actively promoting to crawlers versus what exists in the rendered DOM.

App Ecosystem Considerations

Several Shopify apps address crawl budget directly or indirectly. SEO-focused apps such as Plug In SEO and SEO Manager provide bulk canonical tag management and meta robots noindex controls, which are the primary levers for telling Googlebot to skip low-value pages without blocking them entirely. Setting noindex on /collections/all and its paginated variants, for example, removes those pages from the indexable URL pool without disallowing the crawl.

Page-speed and caching apps affect crawl budget indirectly. Googlebot allocates more crawl budget to faster servers. Shopify's CDN infrastructure is fast by default, but stores loading dozens of app scripts on every page increase time-to-first-byte at the rendered level. Reducing app bloat through the theme code or switching to lighter alternatives improves both user experience and the crawl rate Googlebot applies.

Actionable Crawl Budget Priorities for Shopify Operators

Start with the robots.txt.liquid file and add Disallow rules for URL parameters that generate duplicate or low-value pages: sort_by, filter parameters, and the /search? path. Next, audit the sitemap.xml Shopify generates and confirm it excludes pages already set to noindex. Shopify automatically removes noindexed pages from its sitemap, but this only applies to pages managed through the platform's native settings โ€” app-generated pages may not follow this rule.

Run a crawl audit quarterly. Compare the number of URLs Googlebot crawls (from Search Console Crawl Stats) against the number of URLs that actually rank or convert. A store where 80 percent of crawled URLs produce zero organic sessions is wasting budget that could be reallocated to new product pages or editorial content. Consolidate thin collection pages through redirects, improve internal linking to priority pages, and remove app routes that serve no indexable purpose. These three actions produce measurable crawl efficiency gains within one to two crawl cycles.

Frequently asked questions

Does Shopify automatically handle crawl budget optimization?

No. Shopify sets canonical tags on duplicate product URLs and generates a sitemap.xml, but it does not block filter parameters, sort parameters, or app-proxy URLs by default. Merchants must edit the robots.txt.liquid template and set noindex tags on low-value pages manually. The default Shopify configuration leaves a large volume of duplicate and thin URLs open to Googlebot.

What is the biggest crawl budget problem unique to Shopify?

The collection-scoped product URL duplicate is the most documented Shopify-specific issue. Every product accessible through a collection gets a second URL at /collections/[handle]/products/[product-handle] in addition to the canonical /products/[product-handle]. On stores with hundreds of collections and thousands of products, this can double the crawlable URL count. Shopify sets the correct canonical, but Googlebot still spends crawl budget discovering and processing both URLs.

Can Shopify merchants edit robots.txt to fix crawl budget issues?

Yes, since Shopify introduced the robots.txt.liquid template, merchants on all plans can edit it through the theme code editor. This allows adding Disallow directives for URL parameter patterns like ?sort_by= and ?filter. The file is accessible under Online Store > Themes > Actions > Edit Code. Changes take effect as soon as Googlebot recrawls the robots.txt file, typically within a few days.

How do Shopify apps affect crawl budget?

Apps create crawl budget pressure in two ways: they generate app-proxy URLs under /apps/[app-name]/ that Googlebot discovers and crawls, and they inject JavaScript links into rendered pages that crawlers follow. Apps with customer-facing pages โ€” reviews, loyalty portals, quiz tools โ€” are the most common sources of unintended crawlable routes. Auditing app-generated URLs with a JavaScript-rendering crawl tool identifies which routes to block in robots.txt.

How do I know if crawl budget is actually a problem for my Shopify store?

Check Google Search Console's Crawl Stats report for total daily crawl volume, then compare it against the number of URLs in your sitemap and the number of pages earning organic traffic. If Googlebot crawls three to five times more URLs than are indexed or driving sessions, budget is being wasted. Stores with fewer than 1,000 products rarely have a critical crawl budget problem; stores with 10,000-plus SKUs and active faceted filtering do.

MG
Written by

Matt is the founder of RunOctopus. He built All Angles Creatures from zero to page-1 rankings in reptile feeder insects in under 60 days using exactly this method โ€” turning a hard, entrenched niche into RunOctopus's proof store for programmatic SEO and AI search citation.

Connect on LinkedIn →

See what Otto would build for your store

Free architecture preview. No card required. Five minutes.

Generate Preview →