Does a robots.txt block cause a crawl error in Google Search Console?

No. When Googlebot respects a robots.txt Disallow rule, it skips the URL entirely and logs it as 'excluded by robots.txt'. Not as a crawl error. Crawl errors only appear when the crawler actually attempted the HTTP request and received a failed response. The two signals appear in separate sections of Google Search Console's Coverage report.

Can a page be blocked by robots.txt and still generate a crawl error?

Not under normal crawling. Because Googlebot skips disallowed URLs without making an HTTP request, no error is recorded for the page itself. However, if robots.txt itself fails to load due to a server fault, that failure is reported as a crawl error for the robots.txt file, not for individual blocked URLs.

Which is worse for SEO. A crawl error or a robots.txt block on an important page?

A robots.txt block on an important page is typically worse because it prevents indexing entirely and can be invisible if the operator isn't checking the 'excluded' report. A crawl error on the same page is at least logged with a status code and referring URL, giving a clear signal to fix. Both result in the page being absent from search results, but the robots.txt block is harder to detect.

How do I tell if a missing page is blocked by robots.txt or returning a 404?

In Google Search Console, check the URL Inspection tool for the specific page. It reports both the robots.txt crawl allowed status and the last HTTP response code. Alternatively, crawl the site with a tool like Screaming Frog with robots.txt respect disabled. This fetches disallowed URLs and exposes their real HTTP status codes regardless of what robots.txt instructs.

Should checkout and account pages use robots.txt blocks or noindex tags to stay out of Google?

Robots.txt Disallow is the correct tool for pages that should never be fetched. It conserves crawl budget. However, if a page is accidentally linked from an external source, a robots.txt block alone won't prevent Google from knowing the URL exists. For complete control, combine robots.txt Disallow with server-side authentication rather than relying on a noindex tag, which requires the page to be fetched before it is read.

Crawl Error vs robots.txt: What's the Difference?

Crawl Error vs robots.txt: The Core Distinction

A crawl error is an unintended failure. The crawler tried to fetch a URL and received a broken response: a 4xx status, a 5xx status, a DNS timeout, or a connection refused. The page was meant to be accessible, but something went wrong during the request. Crawl errors are symptoms of infrastructure or configuration problems that need to be fixed.

A robots.txt directive is an intentional instruction. The store operator places a robots.txt file at the domain root and uses Disallow rules to tell crawlers which paths they should not fetch. When a crawler respects that directive, no error occurs. The crawler simply skips the URL by design. The outcome looks similar in a coverage report, but the causes and remedies are completely different.

The sharpest way to draw the line: crawl errors are unplanned failures you fix. Robots.txt blocks are planned exclusions you configure. Confusing the two leads to either chasing phantom errors or accidentally exposing pages you meant to hide.

How Each Mechanism Works Under the Hood

When a crawler encounters a URL, it first fetches robots.txt from the root of that domain. If the matching Disallow rule covers the target URL, the crawler records the URL as 'blocked by robots.txt' and moves on without issuing an HTTP request to the page itself. No network connection to the page is made, no status code is returned, and no error is logged.

A crawl error, by contrast, happens after the crawler has cleared the robots.txt check and actually attempted the HTTP request. The server responds with a 404 (page not found), 500 (server error), or the connection drops entirely. The crawler records that failure along with the HTTP status code, the time of the attempt, and the referring URL that contained the broken link.

This sequence matters for diagnosis. A URL blocked by robots.txt will never appear in a crawl error report because the crawler never tried to fetch it. If a URL shows up as a crawl error, robots.txt was either permissive or irrelevant to that path.

Where They Overlap and Create Confusion

The overlap zone is exclusion intent. Both mechanisms can prevent a URL from being indexed. One by blocking access, one by creating a failed access. An ecommerce operator who wants staging pages, duplicate filtered URLs, or internal search results out of Google's index sometimes uses robots.txt Disallow as a quick fix. But if those same URLs are also linked from sitemaps or crawled pages, Google Search Console still reports them. As 'excluded by robots.txt' rather than as errors.

The dangerous case is when a robots.txt Disallow accidentally covers a page the store needs indexed. The operator sees the page absent from search results, checks crawl error reports, finds nothing, and concludes the page is fine. The real culprit. The robots.txt block. Lives on a separate diagnostic screen. This mismatch is common on platforms like Shopify where theme updates or app installs can append lines to robots.txt without explicit operator action.

Another overlap: a server misconfiguration can prevent Googlebot from fetching robots.txt itself, which triggers a crawl error for robots.txt specifically. When that happens, Google typically treats the entire domain as unrestricted rather than fully blocked. The opposite of what the operator might expect.

When Each Applies in an Ecommerce Context

Use robots.txt Disallow deliberately for paths that should never be indexed: checkout flows (/checkout/), account pages (/account/), internal search results (?q=), and admin panels. These pages are functional but provide no SEO value, and blocking them conserves crawl budget for product and category pages.

Treat crawl errors as the first signal that something structural broke: a product was deleted but internal links still point to it (404), a server-side rendering timeout is returning 5xx on high-traffic SKU pages, or a CDN misconfiguration drops connections before serving the response. Each of these requires a fix. Either restoring the page, redirecting the URL, or resolving the infrastructure fault.

The decision rule is simple: if exclusion is intentional, robots.txt is the right tool. If exclusion is accidental, crawl errors are the symptom and the underlying infrastructure issue is the fix target.

How They Interact: Blocked URLs That Also Have Errors

A URL can be simultaneously blocked by robots.txt and returning a 404 on the server. Because the crawler never fetches a disallowed URL, the 404 is invisible to crawl error reports. If the operator later removes the Disallow rule. Perhaps after a site migration. Those 404s surface immediately. What looked like a clean site suddenly shows hundreds of broken links. This is common after platform migrations where old robots.txt rules masked broken redirect work.

The audit sequence to avoid this trap: before removing any robots.txt Disallow rule, verify the URLs it covers either return 200 or have proper 301 redirects in place. Tools like Screaming Frog with 'Respect robots.txt' toggled off will fetch disallowed URLs and expose their real HTTP status codes, giving the operator a complete picture before changing the live configuration.

Actionable Diagnostic Steps for Store Operators

Start every technical SEO audit by separating two reports in Google Search Console: the Coverage report filtered to 'Excluded. Blocked by robots.txt' and the Coverage report filtered to crawl errors (4xx, 5xx). Treat these as entirely separate work queues. Mixing them leads to wasted effort and missed issues.

For robots.txt, validate the file monthly using Google Search Console's robots.txt Tester or a dedicated crawler. Confirm that Disallow rules cover only intended paths and that no product, category, or collection URLs appear in the blocked list accidentally. For crawl errors, prioritize 5xx errors first (server faults that affect all users, not just crawlers), then 4xx errors on pages with inbound links or high historical traffic.

The final check: after any platform update, app install, or theme change, re-fetch robots.txt and compare it to the version from before the change. Automated robots.txt modifications are one of the most common sources of sudden ranking drops in ecommerce, and a line-by-line diff takes under two minutes.

Crawl Error vs robots.txt: What's the Difference?

Crawl Error vs robots.txt: The Core Distinction

How Each Mechanism Works Under the Hood

Where They Overlap and Create Confusion

When Each Applies in an Ecommerce Context

How They Interact: Blocked URLs That Also Have Errors

Actionable Diagnostic Steps for Store Operators

Frequently asked questions

Does a robots.txt block cause a crawl error in Google Search Console?

Can a page be blocked by robots.txt and still generate a crawl error?

Which is worse for SEO. A crawl error or a robots.txt block on an important page?

How do I tell if a missing page is blocked by robots.txt or returning a 404?

Should checkout and account pages use robots.txt blocks or noindex tags to stay out of Google?

Matt Goren

See what Otto would build for your store

Crawl Error vs robots.txt: What's the Difference?

Crawl Error vs robots.txt: The Core Distinction

How Each Mechanism Works Under the Hood

Where They Overlap and Create Confusion

When Each Applies in an Ecommerce Context

How They Interact: Blocked URLs That Also Have Errors

Actionable Diagnostic Steps for Store Operators

Frequently asked questions

Does a robots.txt block cause a crawl error in Google Search Console?

Can a page be blocked by robots.txt and still generate a crawl error?

Which is worse for SEO. A crawl error or a robots.txt block on an important page?

How do I tell if a missing page is blocked by robots.txt or returning a 404?

Should checkout and account pages use robots.txt blocks or noindex tags to stay out of Google?

Matt Goren

Keep reading

Crawl Error. Full definition

Crawl Error vs 404 Error: What's the Difference?

Crawl Error vs 301 Redirect: What's the Difference?

See what Otto would build for your store