Skip to main content
Checklist

llms.txt Checklist: 12 Items Every Ecommerce Store Should Audit

By ยท Updated ยท 7 min read

Why Ecommerce Stores Need an llms.txt Audit

llms.txt is a plain-text file placed at the root of a domain that signals to AI crawlers and large language models which content is authoritative, indexable, and suitable for AI-generated responses. For ecommerce operators, this file directly influences whether product pages, category guides, and brand content get cited in AI search tools like ChatGPT, Perplexity, and Google AI Overviews.

Without an audit, stores risk missing the file entirely, pointing AI crawlers to outdated sitemaps, or burying high-value catalog content under directories that AI agents deprioritize. This 12-item checklist gives operators a structured pass/fail framework to fix gaps before AI-driven traffic becomes a primary acquisition channel.

File Existence and Placement Checks (Items 1โ€“3)

Item 1 โ€” File exists at root: Navigate to yourdomain.com/llms.txt in a browser. PASS: The file loads as plain text with a 200 HTTP status. FAIL: Any redirect, 404, or HTML response. The file must live at the apex domain root, not a subdomain or subdirectory.

Item 2 โ€” File is UTF-8 encoded, plain text: Download the raw file and verify encoding. PASS: UTF-8 encoding confirmed, no HTML tags, no BOM characters, line breaks are LF or CRLF. FAIL: File contains HTML markup, special encoding, or renders as a webpage. Encoding errors cause AI parsers to skip or misread the file.

Item 3 โ€” HTTPS delivery with no mixed-content errors: Load the file over HTTPS and check browser dev tools for warnings. PASS: File loads cleanly over HTTPS with no certificate errors. FAIL: HTTP-only delivery, expired SSL, or mixed-content warnings. AI crawlers that enforce secure connections will reject insecure llms.txt files.

Content Structure and Formatting Checks (Items 4โ€“6)

Item 4 โ€” Required header block is present and complete: The file must open with a # title line followed by a > description block. PASS: Both elements appear within the first five lines, the title is descriptive of the store's primary purpose, and the description is one to three sentences. FAIL: Missing title, missing description, or placeholder text like 'My Store' that provides no semantic signal.

Item 5 โ€” Sections use correct H2 Markdown syntax: Each content category must be declared with a ## heading. PASS: Every logical grouping (e.g., ## Products, ## Policies, ## Blog) uses ## syntax, not bold text, asterisks, or arbitrary separators. FAIL: Inconsistent heading levels or non-Markdown formatting. AI parsers rely on standardized Markdown hierarchy to segment content types.

Item 6 โ€” All listed URLs are absolute, not relative: Every URL in the file must begin with https://. PASS: Zero relative paths (no /products/shoes, no ../policies). FAIL: Any relative URL. Relative paths are ambiguous to AI agents that may process the file outside the context of a browser session.

URL Validity and Coverage Checks (Items 7โ€“9)

Item 7 โ€” All listed URLs return 200 status codes: Crawl every URL in the file using a tool like Screaming Frog or a bulk HTTP status checker. PASS: Every URL returns 200. FAIL: Any URL returns 301, 302, 404, 410, or 5xx. Dead or redirected links signal stale maintenance and reduce AI parser trust in the file's accuracy.

Item 8 โ€” High-priority category and collection pages are included: Cross-reference your top revenue-driving category URLs against the llms.txt listing. PASS: The top ten category or collection pages by revenue appear in the file. FAIL: The file lists only the homepage or blog posts while omitting the catalog structure. AI citation tools need explicit signals to surface product category content.

Item 9 โ€” Product detail pages are represented via a structured pattern or explicit listing: PASS: Either a representative set of canonical product URLs is listed, or a clear URL pattern (e.g., https://yourdomain.com/products/*) is documented in a comment line. FAIL: No product URLs and no pattern documentation. Without this, AI systems default to inferring catalog structure from sitemaps, which may not reflect priority.

Exclusion, Permissions, and Freshness Checks (Items 10โ€“12)

Item 10 โ€” Sensitive or non-public URLs are explicitly excluded: Review the file for any checkout, account, order history, or admin URLs. PASS: No private URLs appear; if an exclusion section exists, it uses clear noindex-equivalent language. FAIL: Checkout flows, /account/, or /cart/ URLs appear in the file. Including transactional pages wastes AI crawler attention on non-citable content and can expose session-sensitive URL patterns.

Item 11 โ€” The file is consistent with robots.txt and sitemap.xml: Compare the directories listed in llms.txt against the allow/disallow rules in robots.txt and the URLs in sitemap.xml. PASS: No URL listed in llms.txt is disallowed in robots.txt; at least 80% of llms.txt URLs also appear in sitemap.xml. FAIL: Contradictions exist between files. Conflicting signals cause AI crawlers to apply the most restrictive interpretation.

Item 12 โ€” The file has been updated within the last 90 days: Check the file's Last-Modified HTTP response header or internal changelog comment. PASS: Last-Modified date is within 90 days, or a ## Last Updated line reflects a recent date. FAIL: No date signal, or the file predates significant catalog changes such as seasonal launches or major category restructures. Stale llms.txt files undermine the freshness signals that AI systems use to prioritize content for citation.

Acting on Your Audit Results

Treat any single FAIL as a blocking issue if it falls in items 1โ€“3 or item 11. A missing file, broken HTTPS delivery, or direct conflict with robots.txt negates everything else. Fix these before addressing structural or coverage gaps.

For items 4โ€“9, prioritize coverage and formatting fixes over minor style inconsistencies. A structurally valid file with complete URL coverage outperforms a perfectly formatted file that lists only five pages. Build the file update into your quarterly SEO release cycle so that new category launches and seasonal catalog changes trigger an automatic review against items 8, 9, and 12.

Document the audit results in a shared changelog so that developers, SEO leads, and merchandising teams share visibility into the file's state. llms.txt is not a set-and-forget asset โ€” it requires the same maintenance discipline as a sitemap.

Frequently asked questions

What is llms.txt and why does it matter for ecommerce stores?

llms.txt is a plain-text file at the root of a domain that tells AI crawlers and large language models which pages are authoritative and citable. For ecommerce stores, it directly affects whether product categories, buying guides, and brand content appear in AI-generated search responses from tools like Perplexity, ChatGPT, and Google AI Overviews โ€” channels that are growing as primary discovery surfaces.

How often should an ecommerce store update its llms.txt file?

Update the file whenever significant catalog changes occur โ€” new category launches, major SKU additions, discontinued product lines, or URL restructures. At minimum, review and re-publish the file every 90 days. The Last-Modified HTTP header or an inline changelog comment gives AI parsers a freshness signal; a stale file reduces citation priority for time-sensitive or frequently updated content.

Does llms.txt replace the XML sitemap for AI crawlers?

No. llms.txt and sitemap.xml serve different purposes and should coexist. The sitemap communicates the full URL inventory to traditional search crawlers. llms.txt communicates editorial priority and content type to AI agents. The two files should be consistent โ€” any URL listed in llms.txt should also appear in the sitemap and not be blocked by robots.txt.

What happens if checkout or account pages are accidentally listed in llms.txt?

AI crawlers treat listed URLs as citable content candidates. Including checkout, cart, or account pages wastes crawler attention on non-informational pages, exposes transactional URL structures unnecessarily, and produces no citation benefit. Remove all session-sensitive and transactional URLs from the file. These pages have no value to an AI system generating informational or product-discovery responses.

Is a Markdown-formatted llms.txt actually required, or is plain text with URLs sufficient?

The emerging convention is Markdown structure with a # title, > description block, and ## section headings, as defined by the llms.txt specification proposed by Answer.AI. While some AI parsers are tolerant of unformatted lists, structured Markdown allows parsers to segment content types accurately. A flat URL list without headings or descriptions reduces the semantic value the file provides to AI systems.

MG
Written by

Matt is the founder of RunOctopus. He built All Angles Creatures from zero to page-1 rankings in reptile feeder insects in under 60 days using exactly this method โ€” turning a hard, entrenched niche into RunOctopus's proof store for programmatic SEO and AI search citation.

Connect on LinkedIn →

See what Otto would build for your store

Free architecture preview. No card required. Five minutes.

Generate Preview →