How to Use This Crawl Error Checklist
Crawl errors occur when a search engine bot requests a URL on your store and receives a response that prevents proper indexing โ a 404, a redirect loop, a blocked resource, or a server-side failure. For ecommerce stores with thousands of SKUs, category pages, and filtered URLs, these errors accumulate silently and erode crawl budget and rankings.
Work through each item below using Google Search Console, your server logs, and a crawler tool such as Screaming Frog or Sitebulb. Mark each item PASS or FAIL. Any FAIL requires a fix before your next scheduled crawl audit.
Checklist Items 1โ4: Foundational URL Health
1. SOFT 404 COUNT โ Pull the 'Not Found' report in Google Search Console > Indexing > Pages. PASS: Zero URLs are flagged as soft 404s. FAIL: Any URL returns a 200 status with 'page not found' body content instead of a true 404 or 301 redirect.
2. HARD 404 ON DELETED PRODUCTS โ Export all 4xx URLs from your crawler. PASS: Every deleted product URL either 301-redirects to the closest matching category or returns a true 404 with no internal links pointing to it. FAIL: Deleted product URLs still appear in your internal link graph or XML sitemap.
3. REDIRECT CHAINS LONGER THAN TWO HOPS โ Run a crawl and filter for redirect chains. PASS: Every redirect resolves in one hop (A โ B). FAIL: Any chain exceeds two hops (A โ B โ C), which wastes crawl budget and dilutes link equity.
4. REDIRECT LOOPS โ Check for circular redirects in your crawler's redirect report. PASS: No URL redirects back to itself or into a cycle. FAIL: Any URL is part of a loop (A โ B โ A), which causes bots to abandon crawling that URL entirely.
Checklist Items 5โ8: Robots, Sitemaps, and Server Responses
5. ROBOTS.TXT BLOCKING CRITICAL PATHS โ Fetch your robots.txt and cross-reference disallowed paths against your top-revenue category and product URLs. PASS: No revenue-generating or indexable URLs are disallowed. FAIL: A disallow rule blocks /collections/, /products/, or /category/ paths that should be indexed.
6. XML SITEMAP CONTAINS ONLY INDEXABLE URLs โ Download your XML sitemap and run each URL through a status-code checker. PASS: Every URL in the sitemap returns a 200 status and is not tagged noindex. FAIL: The sitemap includes redirected URLs, 4xx URLs, or noindex pages, which forces bots to waste crawl budget on dead ends.
7. SERVER ERRORS (5XX) IN SEARCH CONSOLE โ Open the Server Errors report under Crawl Stats in Google Search Console. PASS: Zero 5xx errors recorded in the past 30 days. FAIL: Any 500, 502, 503, or 504 responses appear, indicating server instability that causes Googlebot to reduce its crawl rate.
8. CRAWL RATE THROTTLING SIGNAL โ In Google Search Console > Settings > Crawl Rate (legacy) or via Crawl Stats, review average response time. PASS: Median server response to Googlebot is under 200ms. FAIL: Response time spikes above 500ms consistently, which triggers automatic crawl rate reduction by Google.
Checklist Items 9โ12: Ecommerce-Specific Crawl Traps
9. FACETED NAVIGATION GENERATING DUPLICATE URLs โ Crawl your site with JavaScript rendering enabled and count unique parameter combinations on category pages. PASS: Faceted URLs are either canonicalized to the base category URL, blocked via robots.txt, or handled with URL parameter rules in Search Console. FAIL: Hundreds of unique parameter URLs are crawlable and lack canonical tags, fragmenting crawl budget.
10. PAGINATED SERIES WITHOUT SELF-REFERENCING CANONICALS โ Check pages 2+ of category and blog pagination. PASS: Each paginated page has a self-referencing canonical (page 2 canonicals to page 2). FAIL: All paginated pages point their canonical to page 1, signaling duplicate content and causing Googlebot to skip deeper pagination.
11. INTERNAL LINKS POINTING TO REDIRECTED URLs โ Export your internal link report and flag any anchor href that returns a 3xx status. PASS: All internal links resolve directly to their final 200-status destination. FAIL: Internal links pass through one or more redirects, wasting the link equity transfer and adding latency to the crawl path.
12. ORPHANED PAGES WITH NO INTERNAL LINKS โ Compare your XML sitemap URL list against your crawler's discovered-via-internal-links list. PASS: Every URL in the sitemap is reachable via at least one internal link from a crawled page. FAIL: Any sitemap URL has zero internal links pointing to it โ bots rely solely on the sitemap to find it, making it fragile to crawl.
Prioritizing Fixes After the Audit
Not all FAILs carry equal weight. Fix items in this order: server errors (5xx) first because they suppress the entire site's crawl rate; redirect loops and chains second because they create dead ends; soft 404s and sitemap pollution third because they consume crawl budget on worthless URLs; faceted navigation and orphaned pages last because their impact scales with catalog size.
Run this checklist on a quarterly cadence for stores under 10,000 SKUs and monthly for stores above that threshold. Catalog changes โ product launches, category restructures, seasonal redirects โ introduce new crawl errors faster than most teams catch them manually. A scheduled audit prevents error accumulation from reaching the point where Googlebot deprioritizes the domain.