Why Ecommerce Stores Need an llms.txt Audit
llms.txt is a plain-text file placed at the root of a domain that signals to AI crawlers and large language models which content is authoritative, indexable, and suitable for AI-generated responses. For ecommerce operators, this file directly influences whether product pages, category guides, and brand content get cited in AI search tools like ChatGPT, Perplexity, and Google AI Overviews.
Without an audit, stores risk missing the file entirely, pointing AI crawlers to outdated sitemaps, or burying high-value catalog content under directories that AI agents deprioritize. This 12-item checklist gives operators a structured pass/fail framework to fix gaps before AI-driven traffic becomes a primary acquisition channel.
File Existence and Placement Checks (Items 1โ3)
Item 1 โ File exists at root: Navigate to yourdomain.com/llms.txt in a browser. PASS: The file loads as plain text with a 200 HTTP status. FAIL: Any redirect, 404, or HTML response. The file must live at the apex domain root, not a subdomain or subdirectory.
Item 2 โ File is UTF-8 encoded, plain text: Download the raw file and verify encoding. PASS: UTF-8 encoding confirmed, no HTML tags, no BOM characters, line breaks are LF or CRLF. FAIL: File contains HTML markup, special encoding, or renders as a webpage. Encoding errors cause AI parsers to skip or misread the file.
Item 3 โ HTTPS delivery with no mixed-content errors: Load the file over HTTPS and check browser dev tools for warnings. PASS: File loads cleanly over HTTPS with no certificate errors. FAIL: HTTP-only delivery, expired SSL, or mixed-content warnings. AI crawlers that enforce secure connections will reject insecure llms.txt files.
Content Structure and Formatting Checks (Items 4โ6)
Item 4 โ Required header block is present and complete: The file must open with a # title line followed by a > description block. PASS: Both elements appear within the first five lines, the title is descriptive of the store's primary purpose, and the description is one to three sentences. FAIL: Missing title, missing description, or placeholder text like 'My Store' that provides no semantic signal.
Item 5 โ Sections use correct H2 Markdown syntax: Each content category must be declared with a ## heading. PASS: Every logical grouping (e.g., ## Products, ## Policies, ## Blog) uses ## syntax, not bold text, asterisks, or arbitrary separators. FAIL: Inconsistent heading levels or non-Markdown formatting. AI parsers rely on standardized Markdown hierarchy to segment content types.
Item 6 โ All listed URLs are absolute, not relative: Every URL in the file must begin with https://. PASS: Zero relative paths (no /products/shoes, no ../policies). FAIL: Any relative URL. Relative paths are ambiguous to AI agents that may process the file outside the context of a browser session.
URL Validity and Coverage Checks (Items 7โ9)
Item 7 โ All listed URLs return 200 status codes: Crawl every URL in the file using a tool like Screaming Frog or a bulk HTTP status checker. PASS: Every URL returns 200. FAIL: Any URL returns 301, 302, 404, 410, or 5xx. Dead or redirected links signal stale maintenance and reduce AI parser trust in the file's accuracy.
Item 8 โ High-priority category and collection pages are included: Cross-reference your top revenue-driving category URLs against the llms.txt listing. PASS: The top ten category or collection pages by revenue appear in the file. FAIL: The file lists only the homepage or blog posts while omitting the catalog structure. AI citation tools need explicit signals to surface product category content.
Item 9 โ Product detail pages are represented via a structured pattern or explicit listing: PASS: Either a representative set of canonical product URLs is listed, or a clear URL pattern (e.g., https://yourdomain.com/products/*) is documented in a comment line. FAIL: No product URLs and no pattern documentation. Without this, AI systems default to inferring catalog structure from sitemaps, which may not reflect priority.
Exclusion, Permissions, and Freshness Checks (Items 10โ12)
Item 10 โ Sensitive or non-public URLs are explicitly excluded: Review the file for any checkout, account, order history, or admin URLs. PASS: No private URLs appear; if an exclusion section exists, it uses clear noindex-equivalent language. FAIL: Checkout flows, /account/, or /cart/ URLs appear in the file. Including transactional pages wastes AI crawler attention on non-citable content and can expose session-sensitive URL patterns.
Item 11 โ The file is consistent with robots.txt and sitemap.xml: Compare the directories listed in llms.txt against the allow/disallow rules in robots.txt and the URLs in sitemap.xml. PASS: No URL listed in llms.txt is disallowed in robots.txt; at least 80% of llms.txt URLs also appear in sitemap.xml. FAIL: Contradictions exist between files. Conflicting signals cause AI crawlers to apply the most restrictive interpretation.
Item 12 โ The file has been updated within the last 90 days: Check the file's Last-Modified HTTP response header or internal changelog comment. PASS: Last-Modified date is within 90 days, or a ## Last Updated line reflects a recent date. FAIL: No date signal, or the file predates significant catalog changes such as seasonal launches or major category restructures. Stale llms.txt files undermine the freshness signals that AI systems use to prioritize content for citation.
Acting on Your Audit Results
Treat any single FAIL as a blocking issue if it falls in items 1โ3 or item 11. A missing file, broken HTTPS delivery, or direct conflict with robots.txt negates everything else. Fix these before addressing structural or coverage gaps.
For items 4โ9, prioritize coverage and formatting fixes over minor style inconsistencies. A structurally valid file with complete URL coverage outperforms a perfectly formatted file that lists only five pages. Build the file update into your quarterly SEO release cycle so that new category launches and seasonal catalog changes trigger an automatic review against items 8, 9, and 12.
Document the audit results in a shared changelog so that developers, SEO leads, and merchandising teams share visibility into the file's state. llms.txt is not a set-and-forget asset โ it requires the same maintenance discipline as a sitemap.