Skip to main content
Checklist

Sitemap.xml Checklist: 12 Items Every Ecommerce Store Should Audit

By · Updated · 7 min read

Why Ecommerce Sitemap Audits Require a Structured Checklist

A sitemap.xml file tells search engine crawlers which URLs to prioritize. For ecommerce stores with thousands of product, category, and filter pages, a single misconfiguration can cause crawl budget waste, delayed indexation, or suppressed rankings. Running an ad hoc review misses systematic problems; a structured checklist catches them reliably.

The 12 checks below address the most common failure points: submission errors, URL hygiene, format validity, and strategic exclusions. Each item has a binary pass/fail test so any team member can run the audit without ambiguity.

The 12-Item Sitemap.xml Audit Checklist

1. SITEMAP IS SUBMITTED TO GOOGLE SEARCH CONSOLE AND BING WEBMASTER TOOLS. Pass: The sitemap URL appears under Sitemaps in both tools with a 'Success' status. Fail: The sitemap is absent, shows an error status, or was submitted only to one platform.

2. SITEMAP URL IS DECLARED IN ROBOTS.TXT. Pass: The robots.txt file contains a line reading 'Sitemap: https://yourdomain.com/sitemap.xml' (full absolute URL). Fail: No Sitemap directive exists in robots.txt, forcing crawlers to discover it only through manual submission.

3. ALL SITEMAP URLS RETURN A 200 HTTP STATUS. Pass: Every URL listed in the sitemap returns a 200 status code when crawled. Fail: Any URL returns a 301, 302, 404, 410, or 5xx code. Redirected or broken URLs waste crawl budget and dilute link equity signals.

4. CANONICAL TAGS ON LISTED PAGES POINT TO THEMSELVES. Pass: The canonical tag on each sitemap URL matches that URL exactly (self-referencing). Fail: Any listed URL carries a canonical pointing to a different URL, signaling to Google that the page defers to another version—making its sitemap inclusion counterproductive.

5. NO NOINDEX PAGES ARE INCLUDED IN THE SITEMAP. Pass: Zero pages listed in the sitemap contain a 'noindex' meta robots tag or X-Robots-Tag header. Fail: Any noindex page appears in the sitemap. Including noindex URLs creates a direct contradiction that confuses crawlers and wastes crawl budget.

6. SITEMAP FILE SIZE AND URL COUNT ARE WITHIN LIMITS. Pass: Each sitemap file contains fewer than 50,000 URLs and is under 50 MB uncompressed. Fail: Either limit is exceeded. Use a sitemap index file to split large catalogs into multiple child sitemaps, each within limits.

7. SITEMAP INDEX FILE IS USED FOR MULTI-SITEMAP ARCHITECTURES. Pass: If the store has more than one sitemap file (products, categories, blog, etc.), a sitemap index file at /sitemap.xml references all child sitemaps. Fail: Child sitemaps exist but no index file consolidates them, requiring manual submission of each file separately.

8. ALL URLS USE HTTPS AND THE CORRECT CANONICAL DOMAIN. Pass: Every URL begins with 'https://' and uses the exact domain variant set as canonical (www vs. non-www). Fail: Any URL uses http://, an IP address, or an alternate domain variant not set as the canonical domain.

9. PAGINATED PAGES AND FACETED NAVIGATION URLS ARE EXCLUDED. Pass: URLs containing pagination parameters (e.g., ?page=2) and faceted filter combinations (e.g., ?color=red&size=M) are absent from the sitemap. Fail: These URLs appear in the sitemap, consuming crawl budget on near-duplicate pages that should not rank independently.

10. LASTMOD DATES REFLECT ACTUAL CONTENT CHANGES. Pass: The <lastmod> value for each URL matches the date the page content was meaningfully last updated, verifiable in the CMS or server logs. Fail: All URLs carry the same lastmod date, lastmod is absent entirely, or the date reflects a server-side timestamp unrelated to content changes.

11. IMAGE AND VIDEO SITEMAPS ARE PRESENT FOR MEDIA-HEAVY PAGES. Pass: Product pages with primary images use Google's image sitemap extension (<image:image> tags), and video content uses the video sitemap extension. Fail: No media sitemaps exist for a catalog where image search drives meaningful traffic, leaving image indexation entirely to crawler discovery.

12. SITEMAP RENDERS CORRECTLY WHEN JAVASCRIPT IS REQUIRED. Pass: The sitemap.xml file is served as a static XML file or pre-rendered server-side and returns valid XML without requiring JavaScript execution. Fail: The sitemap is generated dynamically via client-side JavaScript, causing crawlers that do not execute JS to receive an empty or malformed response.

Common Failure Patterns in Ecommerce Sitemaps

The most frequent failures in ecommerce sitemap audits cluster around three problems: including URLs that should be excluded (noindex pages, pagination, faceted filters), stale or fabricated lastmod dates, and HTTPS/domain inconsistencies introduced by platform migrations or CDN configurations.

Stores on platforms like Shopify auto-generate sitemaps that include all active product and collection pages by default. This means deleted-but-redirected pages, draft products accidentally published, and filtered collection URLs can enter the sitemap without manual review. Automated sitemap generation requires periodic human auditing—it does not replace it.

Faceted navigation is the highest-volume source of sitemap pollution on mid-to-large catalogs. A store with 500 products and 10 filter attributes can generate tens of thousands of unique filter-combination URLs. Blocking these at the sitemap level and through robots.txt disallow rules is essential to protecting crawl budget for pages that actually generate revenue.

How to Execute This Audit Efficiently

Start by downloading the sitemap from Google Search Console's Sitemaps report, which shows the last crawl date, URL count, and any parsing errors. Cross-reference this against a crawl of the live site using a tool like Screaming Frog or Sitebulb to identify status codes, canonical discrepancies, and noindex conflicts at scale.

For lastmod validation, export a URL list from the CMS with the actual last-modified dates and compare against the sitemap values. Mismatches indicate the sitemap is being generated with server modification times rather than content publication dates—a common misconfiguration in WordPress and Magento environments.

Run the audit on a quarterly schedule for stores with catalogs under 10,000 SKUs, and monthly for stores above that threshold. New product launches, seasonal promotions, and platform updates all introduce sitemap regressions that compound over time if left unchecked.

Prioritizing Fixes After the Audit

Not all 12 items carry equal urgency. Prioritize in this order: (1) HTTP status errors and noindex conflicts—these actively harm indexation today; (2) HTTPS and canonical domain consistency—these create duplicate content signals; (3) sitemap submission status—if Google has not processed the sitemap, nothing else matters; (4) faceted URL exclusions and lastmod accuracy—these improve crawl efficiency over time.

Document each check result in a spreadsheet with columns for check name, pass/fail status, specific URLs affected, and assigned owner. Attach the sitemap audit to a broader technical SEO ticket workflow so fixes are tracked to completion, not just identified.

Frequently asked questions

How often should an ecommerce store audit its sitemap.xml?

Stores with fewer than 10,000 SKUs should audit quarterly. Stores above that threshold should audit monthly. Any major platform update, domain migration, or catalog restructure warrants an immediate audit regardless of schedule, since these events consistently introduce sitemap regressions.

Does including noindex pages in a sitemap hurt SEO?

Yes. Including a noindex page in the sitemap creates a direct contradiction: the sitemap says 'crawl this' while the page tag says 'do not index this.' Google resolves the conflict by respecting the noindex directive, but the contradictory signal wastes crawl budget and can delay correct processing of nearby pages.

What is the difference between a sitemap index file and a regular sitemap file?

A sitemap index file is an XML file that lists multiple child sitemap files, each covering a segment of the site (products, categories, blog). A regular sitemap file lists individual URLs directly. When a store's total URL count exceeds 50,000 or requires organizational separation by content type, a sitemap index is the correct structure.

Should ecommerce product pages with variants each have their own sitemap entry?

Only if each variant has a distinct canonical URL that should rank independently. If color or size variants share a single canonical URL, list only the canonical URL. Including variant URLs that carry a canonical pointing to the parent page is a sitemap error—those variant URLs add no indexation value and consume crawl budget.

Can a sitemap.xml file fix a crawl budget problem on a large ecommerce site?

A correctly configured sitemap improves crawl efficiency but does not expand the total crawl budget Google allocates. The sitemap signals which URLs are worth crawling, reducing time spent on low-value pages. Removing paginated, filtered, and noindex URLs from the sitemap frees budget for high-priority product and category pages.

MG
Written by

Matt is the founder of RunOctopus. He built All Angles Creatures from zero to page-1 rankings in reptile feeder insects in under 60 days using exactly this method — turning a hard, entrenched niche into RunOctopus's proof store for programmatic SEO and AI search citation.

Connect on LinkedIn →

See what Otto would build for your store

Free architecture preview. No card required. Five minutes.

Generate Preview →