Sitemap.xml is an XML file located at the root of a domain that lists every important URL on a site along with metadata like the last modified date, giving search engines a complete map of indexable pages.
Sitemap.xml in plain English
Sitemap.xml is a machine-readable inventory of a website's URLs, served at /sitemap.xml. For an ecommerce store, it lists product pages, category pages, blog posts, and other indexable URLs, each wrapped in a <url> tag with a <loc> for the address and a <lastmod> date showing when the page last changed.
Search engine crawlers fetch the sitemap to discover URLs and prioritize re-crawling. When a bot reads the file, it parses each <url> entry, compares the <lastmod> timestamp against its index, and queues changed or new URLs for crawling. The sitemap location is declared in robots.txt or submitted directly through Google Search Console and Bing Webmaster Tools. Large sites split URLs across multiple sitemap files referenced by a sitemap index, since a single file is capped at 50,000 URLs or 50MB uncompressed.
A well-built sitemap contains only canonical, indexable, 200-status URLs with accurate <lastmod> values that update when page content actually changes. A poor sitemap includes redirected URLs, 404s, noindexed pages, parameterized duplicates, or stale lastmod dates that never change. Crawlers downgrade trust in sitemaps with unreliable signals, which slows discovery of legitimately updated pages.
The 50,000-URL ceiling per file matters for ecommerce catalogs at scale. A store with 200,000 SKUs needs a sitemap index pointing to at least four child sitemaps, typically segmented by type: products, categories, blog, static pages. This segmentation also makes it easier to diagnose indexation issues in Search Console, where coverage reports break down by submitted sitemap.
Why sitemap.xml matters for ecommerce
Ecommerce catalogs change constantly: products go out of stock, prices update, new SKUs launch, seasonal collections rotate. Without an accurate sitemap.xml, search engines rely on internal links and external signals to find these changes, which delays indexation by days or weeks. Stores that maintain clean sitemaps with truthful lastmod dates get new products indexed faster, recover faster from site migrations, and surface coverage problems early through Search Console reports. Stores that ignore the sitemap, or auto-generate one filled with redirect chains and noindexed URLs, train Google to trust the file less, which compounds into slower crawling and missed revenue on time-sensitive inventory.