What Citation Selection Means Inside Perplexity
Perplexity is an answer engine that returns a synthesized response alongside numbered citations [1] [2] [3] that point to the specific web pages used to construct that response. Unlike a traditional search results page where ten blue links compete for clicks, Perplexity surfaces three to six sources per answer, and those sources are the only pages that receive attribution and referral traffic from the query.
Citation selection is the process by which Perplexity decides which pages enter that pool. It runs on top of the Sonar model family combined with live web retrieval, meaning the system retrieves candidate documents at query time, ranks them, and then grounds the generated answer in the highest-scoring subset. The cited pages are not chosen after the answer is written; they are the substrate the answer is built from.
For ecommerce operators, this changes the visibility math. A product category page that ranks fourth on Google still gets clicks. A category page that ranks fourth inside Perplexity's retrieval set but does not make the final three-to-six citation pool gets nothing. Citation is binary.
How Retrieval and Ranking Actually Work
When a user submits a query, Perplexity rewrites it into one or more search-style sub-queries, fetches candidate URLs through its web index and partner search APIs, and crawls or recalls page content using PerplexityBot and Perplexity-User user agents. PerplexityBot handles broad indexing crawls. Perplexity-User fetches pages in real time on behalf of a specific user query, which is why some pages appear in answers even when they were not pre-indexed.
Candidate pages are then scored on three primary axes: recency, source authority, and on-topic depth. Recency rewards pages with current publication or update timestamps, especially for queries with a temporal component. Source authority rewards domains with established trust signals, inbound link profiles, and editorial reputation. On-topic depth rewards pages that answer the specific question directly rather than tangentially.
The Sonar model then drafts the answer using the top-ranked passages as grounding context, and the URLs behind those passages become the numbered citations the user sees. Pages that contributed no extracted text to the final answer do not get cited, even if they ranked well in retrieval.
The Three Signals Perplexity Weights Most Heavily
Recency is weighted aggressively. Perplexity prefers pages updated in the last twelve months for evergreen queries and pages published within days or weeks for news-adjacent queries. A 2019 buying guide loses to a 2024 buying guide on the same topic even when the 2019 guide has stronger backlinks. Visible last-updated dates, ISO timestamps in structured data, and refreshed content all feed this signal.
Source authority draws from a mix of domain-level trust signals, including link graph reputation, citations in other authoritative content, and presence in curated datasets. Established publishers, manufacturer documentation, government and academic domains, and well-known retailers score higher. A new Shopify store will not match a 20-year-old trade publication on this axis alone, which forces newer sites to compete on the other two axes.
On-topic depth is the most controllable signal. It measures whether the page directly answers the query with specific, extractable claims, not whether it merely mentions the topic. A page titled 'Best Waterproof Hiking Boots Under $200' that lists ten boots with prices, weights, and waterproof ratings beats a generic 'Hiking Gear Guide' page that mentions boots in passing.
What Gets Cited vs What Gets Skipped
Good citation candidates have a clear question-to-answer structure. A product comparison page that opens with a direct summary of the top picks, includes a comparison table with specs, and provides per-product analysis underneath gets cited because every section produces extractable, attributable claims. The same applies to FAQ pages with explicit question headings, glossary entries with one-sentence definitions, and how-to articles with numbered steps.
Poor citation candidates bury the answer. A collection page that opens with brand storytelling, requires JavaScript to render product specs, or relies on infinite scroll to expose key data gives Perplexity nothing clean to quote. Thin product pages with marketing copy but no specifications, dimensions, materials, or use cases score low on on-topic depth even when the URL is technically relevant.
Pages blocked to PerplexityBot in robots.txt are excluded from indexed retrieval entirely. Pages that return real content only after client-side rendering frequently fail extraction. Pages with no visible publication or update date lose ground on recency. Each of these is a self-inflicted exclusion.
How Ecommerce Stores Break Into the Citation Pool
Ecommerce sites enter Perplexity citations through three page types: category and comparison pages that answer 'best X for Y' queries, product detail pages that answer 'does X have Y feature' queries, and content hub articles that answer informational queries adjacent to purchase intent. Transactional pages are not excluded; they are favored when the query has commercial intent and the page contains the specific attributes the query asks about.
The mechanical requirements are concrete. Allow PerplexityBot and Perplexity-User in robots.txt. Render critical content server-side so the first HTML response contains product names, prices, specs, and descriptions. Include visible published and last-updated dates on every page. Use Product, FAQPage, and Article schema where applicable. Write H2 and H3 headings as the literal questions buyers ask.
Content depth matters more than content length. A 600-word category page with a comparison table, five specific product recommendations, and per-product reasoning outperforms a 3,000-word page of generic category prose. Specificity is what Sonar extracts and attributes.
A Concrete Action Plan for Citation Capture
Audit the twenty highest-intent queries in the catalog by running them through Perplexity directly and recording which domains get cited. This produces a target list of competing pages and a clear read on whether the current citation pool is dominated by retailers, publishers, manufacturers, or marketplaces. The competitive shape of the citation pool dictates the response strategy.
For each target query, identify the single most relevant page on the store and rewrite it to lead with a direct, extractable answer in the first 100 words. Add a comparison table, a specifications block, and a last-updated date. Confirm server-side rendering by viewing the raw HTML source and searching for the key facts. Submit the URL through internal linking from already-indexed pages so PerplexityBot discovers it.
Track citation appearance weekly by re-running the target queries. Citation share is the metric, not ranking position. A page that gets cited in four out of ten relevant queries is winning even if its Google rank is page two.