Skip to main content
Checklist

Crawl Error Checklist: 12 Items Every Ecommerce Store Should Audit

By ยท Updated ยท 6 min read

How to Use This Crawl Error Checklist

Crawl errors occur when a search engine bot requests a URL on your store and receives a response that prevents proper indexing โ€” a 404, a redirect loop, a blocked resource, or a server-side failure. For ecommerce stores with thousands of SKUs, category pages, and filtered URLs, these errors accumulate silently and erode crawl budget and rankings.

Work through each item below using Google Search Console, your server logs, and a crawler tool such as Screaming Frog or Sitebulb. Mark each item PASS or FAIL. Any FAIL requires a fix before your next scheduled crawl audit.

Checklist Items 1โ€“4: Foundational URL Health

1. SOFT 404 COUNT โ€” Pull the 'Not Found' report in Google Search Console > Indexing > Pages. PASS: Zero URLs are flagged as soft 404s. FAIL: Any URL returns a 200 status with 'page not found' body content instead of a true 404 or 301 redirect.

2. HARD 404 ON DELETED PRODUCTS โ€” Export all 4xx URLs from your crawler. PASS: Every deleted product URL either 301-redirects to the closest matching category or returns a true 404 with no internal links pointing to it. FAIL: Deleted product URLs still appear in your internal link graph or XML sitemap.

3. REDIRECT CHAINS LONGER THAN TWO HOPS โ€” Run a crawl and filter for redirect chains. PASS: Every redirect resolves in one hop (A โ†’ B). FAIL: Any chain exceeds two hops (A โ†’ B โ†’ C), which wastes crawl budget and dilutes link equity.

4. REDIRECT LOOPS โ€” Check for circular redirects in your crawler's redirect report. PASS: No URL redirects back to itself or into a cycle. FAIL: Any URL is part of a loop (A โ†’ B โ†’ A), which causes bots to abandon crawling that URL entirely.

Checklist Items 5โ€“8: Robots, Sitemaps, and Server Responses

5. ROBOTS.TXT BLOCKING CRITICAL PATHS โ€” Fetch your robots.txt and cross-reference disallowed paths against your top-revenue category and product URLs. PASS: No revenue-generating or indexable URLs are disallowed. FAIL: A disallow rule blocks /collections/, /products/, or /category/ paths that should be indexed.

6. XML SITEMAP CONTAINS ONLY INDEXABLE URLs โ€” Download your XML sitemap and run each URL through a status-code checker. PASS: Every URL in the sitemap returns a 200 status and is not tagged noindex. FAIL: The sitemap includes redirected URLs, 4xx URLs, or noindex pages, which forces bots to waste crawl budget on dead ends.

7. SERVER ERRORS (5XX) IN SEARCH CONSOLE โ€” Open the Server Errors report under Crawl Stats in Google Search Console. PASS: Zero 5xx errors recorded in the past 30 days. FAIL: Any 500, 502, 503, or 504 responses appear, indicating server instability that causes Googlebot to reduce its crawl rate.

8. CRAWL RATE THROTTLING SIGNAL โ€” In Google Search Console > Settings > Crawl Rate (legacy) or via Crawl Stats, review average response time. PASS: Median server response to Googlebot is under 200ms. FAIL: Response time spikes above 500ms consistently, which triggers automatic crawl rate reduction by Google.

Checklist Items 9โ€“12: Ecommerce-Specific Crawl Traps

9. FACETED NAVIGATION GENERATING DUPLICATE URLs โ€” Crawl your site with JavaScript rendering enabled and count unique parameter combinations on category pages. PASS: Faceted URLs are either canonicalized to the base category URL, blocked via robots.txt, or handled with URL parameter rules in Search Console. FAIL: Hundreds of unique parameter URLs are crawlable and lack canonical tags, fragmenting crawl budget.

10. PAGINATED SERIES WITHOUT SELF-REFERENCING CANONICALS โ€” Check pages 2+ of category and blog pagination. PASS: Each paginated page has a self-referencing canonical (page 2 canonicals to page 2). FAIL: All paginated pages point their canonical to page 1, signaling duplicate content and causing Googlebot to skip deeper pagination.

11. INTERNAL LINKS POINTING TO REDIRECTED URLs โ€” Export your internal link report and flag any anchor href that returns a 3xx status. PASS: All internal links resolve directly to their final 200-status destination. FAIL: Internal links pass through one or more redirects, wasting the link equity transfer and adding latency to the crawl path.

12. ORPHANED PAGES WITH NO INTERNAL LINKS โ€” Compare your XML sitemap URL list against your crawler's discovered-via-internal-links list. PASS: Every URL in the sitemap is reachable via at least one internal link from a crawled page. FAIL: Any sitemap URL has zero internal links pointing to it โ€” bots rely solely on the sitemap to find it, making it fragile to crawl.

Prioritizing Fixes After the Audit

Not all FAILs carry equal weight. Fix items in this order: server errors (5xx) first because they suppress the entire site's crawl rate; redirect loops and chains second because they create dead ends; soft 404s and sitemap pollution third because they consume crawl budget on worthless URLs; faceted navigation and orphaned pages last because their impact scales with catalog size.

Run this checklist on a quarterly cadence for stores under 10,000 SKUs and monthly for stores above that threshold. Catalog changes โ€” product launches, category restructures, seasonal redirects โ€” introduce new crawl errors faster than most teams catch them manually. A scheduled audit prevents error accumulation from reaching the point where Googlebot deprioritizes the domain.

Frequently asked questions

How do I find crawl errors on my ecommerce store without a paid tool?

Google Search Console is free and covers the most critical crawl error categories: soft 404s, server errors, submitted URLs with issues, and crawl stats including response times. It does not show every internal link or redirect chain, but it identifies the errors that directly affect Google's ability to index your store. For deeper analysis, the free tier of Screaming Frog crawls up to 500 URLs.

What is the difference between a soft 404 and a hard 404 in ecommerce?

A hard 404 is a server response where the HTTP status code is 404, correctly telling bots the page does not exist. A soft 404 is when the server returns a 200 status (page found) but the page content says 'product not found' or is nearly empty. Soft 404s are worse for crawl budget because Google must render and evaluate the page before determining it has no value, rather than discarding it immediately.

Do crawl errors directly cause ranking drops?

Crawl errors do not automatically drop rankings, but they create conditions that lead to ranking losses. Server errors reduce Googlebot's crawl rate for the entire domain. Soft 404s waste crawl budget on valueless URLs. Redirect chains dilute link equity. Together, these issues cause important category and product pages to be crawled less frequently and indexed less reliably, which degrades ranking stability over time.

How many redirect hops are acceptable before it becomes a crawl error?

Google states it follows up to five redirect hops, but treating anything beyond one hop as a failure is the correct operational standard. Each additional hop adds latency, increases the chance of a broken link in the chain, and reduces the PageRank passed through the redirect. For ecommerce stores with large redirect files, redirect chains accumulate gradually and require quarterly audits to catch.

Should deleted product pages return a 404 or redirect to a category page?

Return a 301 redirect to the closest matching category page if the product was permanently discontinued and the category page has comparable content. Return a hard 404 only if the product was a one-time item with no logical substitute. A 404 on a URL that has external backlinks wastes that link equity. A redirect preserves it. If the product will return seasonally, use a 302 temporary redirect to hold the URL's status.

MG
Written by

Matt is the founder of RunOctopus. He built All Angles Creatures from zero to page-1 rankings in reptile feeder insects in under 60 days using exactly this method โ€” turning a hard, entrenched niche into RunOctopus's proof store for programmatic SEO and AI search citation.

Connect on LinkedIn →

See what Otto would build for your store

Free architecture preview. No card required. Five minutes.

Generate Preview →