Crawl budget is the number of URLs a search engine bot will crawl on a website within a given timeframe, determined by the site's crawl capacity limit and crawl demand.
Crawl Budget in plain English
Crawl budget is the ceiling on how many pages Googlebot or another search engine crawler will fetch from a domain over a set period. For example, a Shopify store with 50,000 product, collection, and filter URLs may only have 8,000 pages crawled per day, meaning the full catalog takes nearly a week to refresh in the index.
Crawl budget is set by two inputs: crawl capacity limit and crawl demand. Capacity is governed by server response time and error rates โ fast, stable servers earn more simultaneous connections, while slow or 5xx-heavy responses force the crawler to back off. Demand is driven by URL popularity, freshness signals, and the size of the known URL set. The crawler pulls from a scheduling queue, prioritizing URLs with higher PageRank, recent updates, and inbound internal links.
Done well, crawl budget is concentrated on canonical, indexable, revenue-driving URLs: product pages, category pages, and editorial content. Logs show Googlebot hitting these URLs within hours of publish or price change. Done poorly, the budget is burned on faceted navigation parameters, internal search results, session IDs, paginated duplicates, and soft-404 product variants โ leaving real products stale in the index for weeks.
Crawl budget becomes a material concern at roughly 10,000+ unique URLs, per Google's own guidance. Below that threshold, most sites are crawled completely without intervention. Above it โ typical for any catalog with faceted filters or large variant counts โ crawl waste compounds quickly and requires active management through robots.txt, canonical tags, and internal linking discipline.
Why crawl budget matters for ecommerce
For ecommerce operators, crawl budget directly controls how fast price changes, new arrivals, restocks, and out-of-stock signals reach Google. A store with 200,000 URLs from color and size filter combinations will see Googlebot spend 70% of its visits on parameter junk while flagship products go un-recrawled for a month โ meaning seasonal launches miss the index window and discontinued SKUs keep ranking. Stores that prune URL bloat, block facets in robots.txt, and consolidate variants under canonical parents get faster indexation of new products, more accurate inventory status in SERPs, and quicker recovery after site migrations or replatforms.