How AI search actually chooses what to cite — and the five layers that decide
AI search engines rarely cite the page that ranks first. They rewrite a question into a fan of related queries, retrieve candidates, then attach a citation to each sentence a source can verify. Drawing on Google's patents and peer-reviewed retrieval research, this maps the five layers of citation selection.
AI search engines rarely cite the page that ranks first. Google's own patents describe rewriting a single question into a fan of related queries, gathering the documents that answer the whole fan, ranking them by trust signals such as author and domain, then attaching a citation to each sentence a source can verify. Citation is earned at the sentence, not the keyword.
That distinction is the whole game, and most marketing teams are still playing the old one. They track a blue-link rank for a head term while the engine quietly runs five other searches the team never sees, reads the results, and decides — sentence by sentence — which sources to name. This piece traces how that decision is made, using the primary sources that describe it: Google's patent filings, the peer-reviewed retrieval-attribution literature, and the controlled study that measured which content changes actually move the needle. It closes with a single mental model — the Citation Selection Stack — and an honest account of what the evidence does not yet prove.
How does an AI engine turn one question into a search?
It does not search your exact question. It searches a fan of related questions and pools the results.
Google's “Search with stateful chat” patent (US20240289407A1) describes a “first” large language model generating additional queries — “alternative query suggestions, supplemental queries, rewritten versions of the user’s query, and/or ‘drill down’ queries” — from one user query.[1] The SEO community calls this query fan-out; the term itself does not appear in the filing, and the patent is written in the permissive “may be” language of a disclosed method, not a confirmation of what ships in production. But the mechanism it documents is unambiguous, and it reorders everything downstream.
“What’s the best tool for my team?”
Each synthetic query retrieves its own candidate set. The page that gets cited is the one that wins across the fan — not the one that ranks first for the literal question the user typed. You are no longer optimizing for a keyword; you are optimizing for coverage of an intent.
Mechanism: Google AI Mode query fan-out (Google) · illustrative synthetic queries
Why it matters: the patent describes selecting documents that are responsive to both the original query and the generated additional queries for inclusion in the candidate set.[2] A page that answers the literal question but none of the adjacent ones competes for a single slice of the fan. A page that covers the intent — the comparison, the price, the reliability question, the alternative — shows up across the fan and accumulates more chances to be cited. You are no longer optimizing for a keyword. You are optimizing for coverage of an intent.
Watch it play out. Each synthetic query builds its own candidate set; the page that wins is the one present across the most of them — not the one that ranks first for the words the user typed:
Never #1 on the literal query — but eligible across the fan, so it accumulates the most chances to be cited.
Wins the head term, but is absent from most of the fan — so it loses the citation it “should” have won.
Citation is decided across the fan, not on one query. The page that covers the comparison, the price, the reliability question, and the alternatives shows up in set after set and out-accumulates the page that merely ranks first for the words the user typed. You are optimizing for coverage of an intent, not a keyword.
Mechanism: per-query candidate retrieval before synthesis · candidate sets illustrative
Which sources clear the trust bar?
Retrieval produces too many candidates to cite. Something has to filter them, and the patent is specific about what.
It describes “query-independent measures” that include a trustworthiness measure for a document “generated based on an author thereof, a domain thereof, and/or inbound link(s) thereto.”[3] Read that carefully. Query-independent means these signals attach to the page before anyone asks a question — they are properties of who published it, where, and who vouches for it. That is the patent-level shadow of the same idea Google states plainly in its public guidance: create people-first, helpful content that demonstrates experience, expertise, authoritativeness, and trustworthiness.[13] The patent says “trustworthiness,” not E-E-A-T; the labels are the industry's, the signal is the engine's.
A source identifier appearing before and/or after a portion of the NL based summary can indicate that the SRD, corresponding to the source identifier, verifies the portion.
That line — the patent's description of how citations are attached — is the most important sentence in the whole filing.[4] The engine is not citing a page because the page is “good.” It is attaching a source to a portion of its own answer because that source verifies the portion. Verification is the unit of citation. Authority gets you into the candidate set; verifiability gets your URL printed next to a sentence.
These signals attach to your page before anyone asks a question — they are properties of who published it, where, and who vouches for it. Clearing the trust bar makes you eligible. It is the next layer — verification — that decides which eligible page is actually named.
Mechanism: query-independent trust measures (author, domain, inbound links) per Google's stateful-chat patent · illustrative
How does the model decide what it can attribute?
Here the primary sources stop being patents and start being peer review — and they are humbling.
The ALCE benchmark, the first systematic evaluation of LLM citation quality, found that on long-form questions “even the best models lack complete citation support 50% of the time.”[5] Half. The models that power answer engines routinely write sentences their own cited sources do not fully support. This is why the same brand, asked the same question twice, can get two different citation sets: attribution is probabilistic, not deterministic.
of long-form answers lack complete citation support, even from the best models.
Attribution is probabilistic, not deterministic — which is why the same brand, asked the same question twice, can get two different citation sets. And the bottleneck is the reader, not the writer: a model can only attribute to what retrieval surfaced. If your page is not retrievable, no amount of authority earns the citation.
Source: Gao et al., ALCE benchmark (arXiv 2305.14627) — ELI5 long-form QA · figure cited inline
The research response has been to bolt verification onto generation. RARR — Retrofit Attribution using Research and Revision — “automatically finds attribution for the output of any text generation model” and then “post-edits the output to fix unsupported content while preserving the original output as much as possible.”[6] And the bottleneck is not the writer; it is the reader. Work on verifiable generation emphasizes “the critical role of retrieval accuracy”: a model can only attribute to what retrieval surfaced.[7]
If the retrieval layer never finds your page, no amount of authority earns the citation. You cannot be quoted from a room the engine never entered.
This is the layer marketers most underestimate. They obsess over being authoritative and ignore being retrievable and verifiable — crawlable, fresh, structured, and written in sentences a machine can match to a claim.
What does the evidence say actually moves visibility?
Mechanism is one thing; measured effect is another. The strongest causal evidence comes from the GEO study by a team at Princeton and IIT Delhi, presented at KDD 2024, which ran controlled content modifications across a 10,000-query benchmark and a live engine.
It reported that Generative Engine Optimization “can boost visibility by up to 40% in generative engine responses.”[8] But the which matters more than the headline. The three top-performing methods — Cite Sources, Quotation Addition, and Statistics Addition — “achieved a relative improvement of 30-40% on the Position-Adjusted Word Count metric,” with the best methods beating baseline by 41%.[9] And the control condition is the punchline for anyone still running a 2015 playbook: classic keyword stuffing “[did not] perform well,” while statistics and quotations “show strong performance improvements across all metrics.”[10]
visibility lift in generative-engine responses
best method · Position-Adjusted Word Count
validated live on Perplexity.ai
Cite Sources, for a rank-5 site
Add relevant quantitative facts. Strongest in law, government, opinion.
Add credible, attributed quotes. Strongest in people & society topics.
Cite external authorities. Strongest in factual queries; lifts low-ranked pages most.
Classic SEO: pack the query's keywords into the copy. Does not transfer.
Bar widths encode the study’s reported 30–40% Position-Adjusted Word Count band for the top three methods against the keyword-stuffing control; the exact per-domain figures live in the paper. The signal is consistent: be more verifiable, not more keyword-dense.
Source: Aggarwal et al., “GEO: Generative Engine Optimization,” arXiv 2311.09735 (KDD 2024) · GEO-bench, 10,000 queries
Two further findings reframe the strategy entirely. The Cite Sources method “led to a substantial 115.1% increase in visibility for websites ranked fifth in SERP, while on average, the visibility of the top-ranked website decreased by 30.3%.”[11] Generative-engine visibility does not mirror classic ranking — it can invert it. And the lifts were not only simulated: the study validated its methods on Perplexity.ai, a real-world engine, and “demonstrate[d] visibility improvements up to 37%.”[12]
The Citation Selection Stack
Put the sources together and a single model falls out. Five layers sit between a question and a printed citation, and each has exactly one lever a publisher controls.
The prompt is rewritten into a fan of synthetic queries covering the intent.
Each synthetic query pulls a candidate set from the index and the live web.
Candidates are reordered by relevance, authority, and entity signals.
The model writes an answer and attaches sources to the spans it can verify.
A small set of sources is shown to the user as the named citations.
Most marketing teams optimize layer 02 and ignore 03 through 05. The pages that get cited are built backwards from layer 05: easy to verify, easy to quote, easy to attribute.
Synthesis: query fan-out + retrieval-augmented generation attribution research
To make that concrete, take a single factual sentence you might publish and run it down the stack — what the engine does at each layer, and the one lever you control to survive it:
“Adding statistics and citations lifted source visibility up to 40% in generative engines.”
- 01 · Fan-out
The question is rewritten into a fan of synthetic queries.
tests · do you cover the intent, or just the keyword
Your lever at this layerCover the whole intent — comparisons, stats, the adjacent questions — not one phrase.
- 02 · Retrieval
Each query pulls a candidate set from the index and the live web.
tests · can the engine even find your sentence
Your lever at this layerBe crawlable, fresh, indexed, and chunked so a passage matches the query.
- 03 · Reranking
Candidates are reordered by relevance, authority, and entity signals.
tests · do you clear the trust bar
Your lever at this layerEarn query-independent trust — author, domain, inbound corroboration.
- 04 · Grounding
The model attaches a source to each span it can verify.
tests · can your sentence be checked against your page
Your lever at this layerWrite a self-contained, quotable line backed by a stat or attributed quote.
- 05 · Surfaced citation
A small set of sources is named to the reader.
tests · are you the cheapest source to verify
Your lever at this layerExpose Claim / FAQ structure so mapping a sentence to your page is trivial.
The sentence is only cited if it survives all five layers — and four of them reward the same thing: verifiability. Authority gets you past reranking; a quotable, checkable line backed by a stat is what gets your URL printed beside the claim. Write the sentence the engine would want to quote, make it true, and make it cheap to check.
Stages map to the Citation Selection Stack · illustrative of the documented mechanism
Most teams pour their effort into layer two — be indexed, be fast — and treat the rest as luck. The pages that get cited are built backwards from layer five: easy to verify, easy to quote, easy to attribute. The practical translation is almost rude in its simplicity. Write the sentence the engine would want to quote, make it true, and make it cheap to check.
What this means for the work
Three shifts follow directly from the evidence, in order of effort-to-impact.
Write to be quoted, not to rank. Every section should open with a self-contained, factual sentence a source could verify — the GEO study's strongest levers were statistics, quotations, and citations, not keywords. If a paragraph cannot be lifted out and still be true, the engine cannot lift it either.
Make verification cheap. Expose the structure that lets an engine map a sentence to its source without guessing. Schema.org Claim and FAQPage markup, a visible last-updated date, and one outbound citation per claim turn an expensive verification into a cheap one. The page below this one does exactly that — every factual sentence here is wrapped as a machine-readable claim:
{
"@context": "https://schema.org",
"@type": "Claim",
"text": "Keyword stuffing did not transfer to generative engines; statistics and quotations did.",
"appearance": { "@type": "WebPage", "@id": "https://martech.llc/research/how-ai-search-chooses-citations" },
"firstAppearance": { "@type": "CreativeWork", "url": "https://arxiv.org/abs/2311.09735" }
}
Measure the surface you can now see. The engines are starting to report it. In February 2026, Bing introduced AI performance reporting in Webmaster Tools, surfacing how pages appear inside AI answers.[14] Treat citation share like a rank: probe the questions your buyers ask, log which sources get named, and watch the set move.
Where the evidence runs out
A serious reading has to mark its own edges. The Google patent is a disclosed method in permissive drafting language — it documents what the system can do, not a sworn description of production AI Overviews, and the terms “query fan-out,” “AI Mode,” and “E-E-A-T” are commentary, not patent text. The GEO results ran on 2023–24 engines and a Perplexity snapshot; whether the exact lifts hold on a 2026 frontier model is genuinely open. And no vendor — not Perplexity, not Bing, not Google — has published a precise, current account of how its live system chooses citations, so the engine-specific mechanics here lean on Google's patent and on academic RAG research applied by analogy.
What survives all of that is the shape of the thing. Engines fan out, retrieve, rank by trust, and cite by verification. The signals that win are credibility signals — statistics, quotations, sources — and they reward the verifiable page over the merely high-ranked one. Optimize for the question after the question, write sentences worth quoting, and make them cheap to check. That work compounds no matter which engine reads it next.
— Sundar Ramesh Kumar · martech.llc
Frequently asked questions
- How do AI search engines choose which sources to cite?
- They do not simply cite the top-ranked page. Google's patent filings describe rewriting one question into a fan of related queries, gathering documents that answer the whole fan, ranking them by query-independent trust signals such as author, domain, and inbound links, then attaching a citation to each sentence a source can verify. Citation is decided at the sentence, not the keyword.
- What is query fan-out in AI search?
- Query fan-out is the practice of expanding a single user question into multiple synthetic queries — alternatives, supplements, rewrites, and drill-downs — and retrieving results for all of them. Google's stateful-chat search patent describes a first large language model generating these additional queries, with documents responsive to both the original and the fanned queries selected as candidate sources.
- Does classic SEO keyword optimization work for generative engines?
- Not on its own. The peer-reviewed GEO study found that keyword stuffing — the classic SEO tactic — does not transfer to generative engines, while adding statistics, credible quotations, and citations to authoritative sources measurably increased a source's visibility. The signals that win citations reward verifiability, not keyword density.
- What content changes increase visibility in AI answers?
- Controlled research on generative engines found that three content changes drove the largest gains: adding relevant statistics, adding credible attributed quotations, and citing external sources. In the study these methods lifted visibility by up to 40% on its benchmark and up to 37% when tested live on Perplexity.ai.
- Is ranking first on Google enough to get cited by AI?
- No. The same research showed that generative-engine visibility does not mirror classic ranking: applying the Cite Sources method raised visibility 115.1% for a site ranked fifth, while the top-ranked site's visibility fell 30.3%. A high blue-link rank is an input, not a guarantee of citation.
- Why is AI citation unreliable even for good sources?
- Because attribution is an unsolved research problem. The ALCE benchmark found that even the best models fail to fully support their own citations about half the time on long-form questions, and retrieval accuracy is a critical bottleneck. A page can be authoritative and still be mis-cited or skipped if the retrieval layer never surfaces it.
- How do you make a page easy for an AI engine to cite?
- Write self-contained, quotable sentences that a source can verify; back claims with statistics and attributed quotations; cite external authorities; and expose machine-readable structure such as schema.org Claim and FAQPage markup so the engine can map a sentence to its source cheaply. The goal is to be the cheapest source to verify.
Sources · 14
Every claim, dated and linked- [1]
Google's 'Search with stateful chat' patent (US20240289407A1) describes a first large language model generating additional, alternative, supplemental, rewritten, and drill-down queries from a single user query.
Google LLC — Search with stateful chat (US20240289407A1)2024-08-29
- [2]
The patent describes selecting documents responsive to both the original query and the LLM-generated additional queries as candidate search-result documents.
Google LLC — Search with stateful chat (US20240289407A1)2024-08-29
- [3]
The patent describes query-independent trustworthiness measures generated based on a document's author, domain, and inbound links.
Google LLC — Search with stateful chat (US20240289407A1)2024-08-29
- [4]
The patent describes attaching a source identifier before or after a portion of the summary to indicate that the source document verifies that portion.
Google LLC — Search with stateful chat (US20240289407A1)2024-08-29
- [5]
The ALCE benchmark found that on the ELI5 long-form QA dataset, even the best models lack complete citation support 50% of the time.
Gao et al. — Enabling LLMs to Generate Text with Citations (ALCE)2023-05-23
- [6]
RARR automatically finds attribution for the output of any text generation model and post-edits the output to fix unsupported content while preserving the original.
Gao et al. — RARR: Researching and Revising for Attribution2022-10-17
- [7]
Research on verifiable generation emphasizes the critical role of retrieval accuracy in citation generation, showing substantial room for improvement in current LLMs.
- [8]
The peer-reviewed GEO study demonstrated that Generative Engine Optimization can boost a source's visibility by up to 40% in generative-engine responses.
Aggarwal et al. — GEO: Generative Engine Optimization (KDD 2024)2023-11-16
- [9]
In the GEO study the top methods Cite Sources, Quotation Addition, and Statistics Addition achieved a relative improvement of 30-40% on Position-Adjusted Word Count, with the best methods beating baseline by 41%.
Aggarwal et al. — GEO: Generative Engine Optimization (KDD 2024)2023-11-16
- [10]
The GEO study found that keyword stuffing, the classic SEO tactic, did not perform well in generative engines while credibility methods did.
Aggarwal et al. — GEO: Generative Engine Optimization (KDD 2024)2023-11-16
- [11]
In the GEO study the Cite Sources method led to a 115.1% increase in visibility for websites ranked fifth, while the top-ranked website's visibility decreased by 30.3% on average.
Aggarwal et al. — GEO (arXiv HTML v3), Section 5.22023-11-16
- [12]
The GEO study validated its methods on Perplexity.ai, a real-world generative engine, and demonstrated visibility improvements up to 37%.
Aggarwal et al. — GEO: Generative Engine Optimization (KDD 2024)2023-11-16
- [13]
Google's guidance directs creators to produce people-first, helpful content and to demonstrate experience, expertise, authoritativeness, and trustworthiness.
Google Search Central — Creating helpful, reliable, people-first content
- [14]
In February 2026 Bing introduced AI performance reporting in Bing Webmaster Tools, exposing how pages appear and are cited inside AI answers.
Bing Webmaster Blog — AI Performance in Bing Webmaster Tools
Cited by
Posts that link to this oneDon’t take our word for it — measure it.
Machine-readable mirror · /research/how-ai-search-chooses-citations/raw.md