Skip to content
martechllc
All research
Answer-engine research12 min read

The fan-out tree you never see — why AI search ranks coverage, then kills it at the rerank

AI Mode doesn't search your question — it decomposes it into a hidden tree of sub-queries, runs them in parallel, then fuses, reranks, and truncates before citing anyone. Drawing on Google's and Microsoft's disclosures, their patents, and a production RAG study, this maps the Fan-Out Surface Area model.

12 min read

When you ask an AI search engine a question, it does not search your question. It quietly breaks the question into subtopics, writes a fan of related searches, runs them in parallel across several sources, then merges, reranks, and truncates the results before a single sentence of the answer is written. You are not ranking for a query. You are ranking for a hidden tree of sub-queries you never see — and the tree has a trapdoor.

That trapdoor is the whole point of this piece, and it is where most generative-engine playbooks go wrong. They tell you to cover more topics so you get retrieved more often. That advice is half-right and dangerously incomplete: the engine does fan out and retrieve broadly, but it then collapses that breadth at a rerank-and-truncate step where most retrieved pages are discarded. Coverage gets you into the room; it does not get you quoted. This piece traces the mechanism end to end using the primary sources that describe it — Google's and Microsoft's own disclosures, their patents, and the peer-reviewed retrieval literature — and closes with a single model, the Fan-Out Surface Area, plus an honest account of what the evidence does not prove.

How does one question become a fan of searches?

It does not run your query. It runs a tree of them.

In March 2025, Google described AI Mode as using a “query fan-out” technique, “issuing multiple related searches concurrently across subtopics and multiple data sources and then bring[ing] those results together to provide an easy-to-understand response.”[1] That is the first-party origin of the term — not analyst speculation, but Google naming its own mechanism. And the scale is not subtle.

The AI performs multi-object reasoning to understand what you're looking at. Then it uses a 'fan-out' technique which triggers multiple searches at once, reads through the results and presents a single, cohesive response.

That description — from a Google engineering explainer — puts the fan at “about a dozen searches in the time it takes to do one,” and extends it to images, where every object in a photo is searched at once.[3] For research-grade questions the tree is far larger: Google says Deep Search “uses the same query fan-out technique but taken to the next level… it can issue hundreds of searches… and create an expert-level fully-cited report in just minutes.”[2]

Query fan-out · the tree you never seeOne prompt becomes a fan of synthetic sub-queries — each retrieves its own candidate set
User prompt

“What project management tool should my design team use?”

Decomposed into 6 parallel sub-queries
B1 · reframeyou appear

project management tools for design teams

B2 · compareyou appear

Asana vs Monday vs ClickUp for agencies

B3 · constraintyou appear

PM software pricing for a 30-person team

B4 · sub-intentyou absent

project management with design proofing

B5 · riskyou appear

PM tool reliability and uptime reviews

B6 · integrationyou absent

PM software that integrates with Figma + Slack

Synthesized answer

One answer, citing the sources that won across the fan — not the page that ranked first for the literal question.

Your coverage · 4 of 6 branches

You never see the tree, so you optimize the trunk — the head term — and lose the branches. Every branch you fail to cover is a retrieval you were never eligible to win. Ranking is a question about one query; citation is a question about the whole fan.

Mechanism: Google AI Mode query fan-out / stateful-chat decomposition · branches + coverage are illustrative of the documented method

None of this is improvised at runtime alone. The mechanism is documented in Google's patents. The “Search with stateful chat” filing describes a first large language model generating “alternative query suggestions, supplemental queries, rewritten versions of the user’s query, and/or ‘drill down’ queries,” submitting both the original and the generated queries to search, and merging the documents responsive to all of them into a single candidate set.[4] One input, many executions, one pooled result. That is query fan-out in the most literal terms a legal document will allow.

The strategic consequence is immediate. A page that answers the literal question but none of the adjacent ones competes for a single branch of the fan. A page that covers the comparison, the price, the reliability question, the integration, and the next-level refinement shows up across many branches and accumulates chances to be retrieved. You are no longer optimizing for a keyword. You are optimizing for coverage of an intent.

Where do the synthetic queries actually come from?

If the engine writes the searches, the question becomes: what does it write, and can you anticipate it?

The research lineage is older than AI Mode. Google Research's least-to-most prompting established the core move — “break down a complex problem into a series of simpler subproblems and then solve them in sequence” — reaching at least 99% accuracy on a compositional benchmark where chain-of-thought managed 16%.[8] The Self-Ask method made it retrieval-shaped: the model explicitly generates and answers decomposed follow-up sub-questions, and — critically — the paper showed the “compositionality gap” does not shrink as models grow.[7] The comfortable assumption that a smarter model will simply understand your one page is empirically false; decomposition is durable.

On the generation side, the engines manufacture queries from material you can influence. One Google patent describes few-shot prompting an LLM to generate diverse synthetic queries per document — “8 questions per document… with a temperature parameter at 0.7” — and then applying round-trip filtering, discarding any generated query that fails to retrieve its own source document.[5] Read that filter again: the system favors content that unambiguously answers the query it would itself generate. Vague, hedged passages get filtered out before they ever compete.

And some of the synthetic queries are written by other people. A separate Google patent describes minting synthetic queries from the anchor text used by a threshold of referring pages and pre-associating them with documents, independent of real-time ranking.[6] The language others use to link to you literally becomes queries you can be matched on — anchor text is a retrieval input, not just an authority signal. This is not a 2024 invention either; that filing carries a 2008 priority date. The LLM era industrialized a mechanic that has been quietly running for over a decade.

Microsoft makes the reformulation step unusually explicit. Bing's Deep Search disclosure describes rewriting one query into many — for “how do points systems work in Japan,” it generates variations like “loyalty card programs Japan” and “best loyalty cards for travelers in Japan” — reviewing about ten times the pages of a normal search.[13] If you want to see the fan made visible, that post is the closest any major engine has come to showing its work.

Put the four together and the fan stops being mysterious — it is manufactured by four machines you can anticipate:

Where the synthetic queries come fromOne head query, four machines that manufacture the fan
Decomposition

Break the intent into subtopics before any search runs.

pricing for small teamsnative integrationsreliability & uptime
Synthetic prompting

An LLM writes diverse query variations from the topic.

best X in 2026X for a 20-person teamX honest review
Anchor text

The words other pages use to link you become queries.

affordable XX for agencies
Reformulation

Rewrite the question into named variations and drill-downs.

X programs comparedX by categorymigrating to X

You cannot see these strings, but you can anticipate them. Cover the vocabulary, the comparison framings, the category and migration variants, and the anchor text others already use for you — so your page matches the queries the machines write, not just the one the user typed.

Mechanism: query decomposition + synthetic-query patents + Bing Deep Search reformulation · example strings illustrative

Why isn't more coverage automatically more citations?

Here is where the prevailing GEO narrative breaks, and where the evidence gets genuinely interesting.

The retrieval research is clear that fan-out helps recall. Generation-Augmented Retrieval showed that “generating diverse contexts for a query is beneficial as fusing their results consistently yields better retrieval accuracy.”[9] That sits on top of the retrieve-then-generate architecture that founded modern RAG: one query pulls top-K passages from a dense index, and the generator marginalizes over them to write “more specific, diverse and factual language.”[11] RAG-Fusion formalized the multi-query version — “generating multiple queries, reranking them with reciprocal scores and fusing the documents” — but its own authors flagged that the extra queries can make answers off-topic when the generated queries drift from intent.[10]

That caveat turns out to be the headline. A 2026 study of a production deployment found that retrieval fusion “does increase raw recall, but these gains are largely neutralized after re-ranking and truncation,” with Hit@10 falling from 0.51 to 0.48 across several configurations — and adding latency from query rewriting and larger candidate sets.[12] Pulling more candidates into the pool did not produce more correct answers once realistic rerank budgets and context limits were applied. The funnel collapses at the cut.

The binding constraint · measuredFan-out widens the candidate pool — then rerank and truncation collapse the gain
~a dozen

searches AI Mode runs in the time of one

hundreds

searches per query in Deep Search mode

~10×

more pages reviewed by Bing Deep Search

0.51 → 0.48

production Hit@10 after fusion + rerank

Raw recall after multi-query fusion rises
End-to-end Hit@10 after rerank + truncation0.51 → 0.48Hit@10

In several production configurations the fusion gain was “largely neutralized after re-ranking and truncation” — recall went up, but the cited set did not.

What survives the cut · Bing’s named rerank signals
Topic matchLevel of detailSource credibilityFreshnessPopularity

This is why the framework name is a warning, not a goal. Surface area gets you into the candidate pool; only rerank survival on the branches you already win gets your URL printed in the answer. Breadth opens the door — credibility, depth, and freshness walk through it.

Sources: production RAG-Fusion deployment (Medrano et al., arXiv 2603.02153) · Bing Deep Search rerank criteria (Microsoft) · figures cited inline

So what survives the cut? Bing, uniquely, names the signals. Deep Search reranks by “how well the topic matches, whether it’s at the appropriate level of detail, how credible and trustworthy the source is, how fresh and popular it is, and so on.”[13] That sentence is the most actionable line in the entire literature: it is a public rerank scorecard. Google has not published its own weighting, so treat these five — topic match, depth, credibility, freshness, popularity — as the best available proxy for what walks through the trapdoor.

Surface area gets you into the candidate pool. Only rerank survival on the branches you already win gets your URL printed in the answer. Breadth is necessary; it is not sufficient.

The Fan-Out Surface Area model

Put the sources together and one model falls out. Your real ranking surface is not a keyword position; it is the share of the hidden sub-query tree your content can satisfy — and the rerank gate that decides which of those branches actually cite you.

The Fan-Out Surface Area modelCoverage of the fan, not rank on the head term, is the unit of competition
surface area = Σ (branches you can satisfy)  ÷ (branches in the fan)
Keyword-optimized page1/6 branches · 17% surface

Ranks #1 for the head term · answers the literal question only

Intent-covering page5/6 branches · 83% surface

Reframe · compare · constraint · risk · integration — the whole fan

Both pages can rank #1 for the head term. Only one of them is eligible to be retrieved across the fan — and eligibility, not rank, is what compounds into citations. The lever is not a better headline; it is wider coverage of the questions behind the question.

Framework: martech.llc · grounded in query fan-out decomposition + multi-query retrieval research · counts illustrative

The name is deliberately a warning. Maximizing surface area feels like progress — more topics, more pages, more branches — but the production evidence says raw breadth is neutralized at truncation. Two pages can both rank first for the head term; only the one that also survives the rerank on its branches gets cited. The mechanics resolve into five stages, each with exactly one lever an operator controls.

From prompt to citation · five mechanicsFive stages convert one question into a fan of citations — each has one operator lever
F1
DecompositionLever · Map the fan before you write

The model rewrites one prompt into a fan of synthetic sub-queries — reframes, comparisons, constraints, drill-downs.

F2
CoverageLever · Cover every sub-intent on one entity

Each sub-query needs a page eligible to answer it. Branches you don't address, you can't win.

F3
Per-branch retrievalLever · Be crawlable, chunkable, and fresh

Every sub-query runs its own retrieval and builds its own candidate set from index + live web.

F4
RerankingLever · Earn entity + authority signals

Candidates are reordered by relevance, authority, and entity corroboration before synthesis.

F5
Synthesis + citationLever · Write the quotable, verifiable line

The answer is assembled and a source is attached to each span it can verify across the fan.

Classic SEO lives entirely at F3 — be indexed, be fast. The fan-out engine decides citations at F1, F2, and F5, where almost no one is working. Build backwards from the citation: cover the fan, then make each branch cheap to verify.

Synthesis: query fan-out decomposition + multi-query retrieval + RAG attribution research

Most teams work the third stage — be indexed, be fast — and treat the rest as weather. The fan-out engine decides citations at the first, second, and final stages, where almost no one is working. Build backwards from the citation: cover the fan so you are eligible, then make each branch you win survive the rerank.

A worked example: mapping one query's fan

Make the hidden tree concrete. Take a real buying question — “best CRM for a 20-person B2B sales team” — and watch what the engine actually does with it. It does not rank you on that string. It decomposes the intent into subtopics, writes a fan of concrete sub-queries, and retrieves a separate candidate set for each. The walkthrough below runs the whole model on that one query: the six branches it writes, what each tests, and the rerank signal your page has to win to be cited on it.

Worked example · run the model on one query“Best CRM for a 20-person B2B sales team” — the hidden fan, branch by branch
The query the user types

“What's the best CRM for our 20-person B2B sales team?”

Decomposed into 6 synthetic sub-queries
  1. B1 · constraint· you cover

    “CRM pricing for small B2B teams”

    tests · the budget question — can a 20-seat team afford it

    What gets you cited on this branch

    A clear per-seat pricing table — wins level of detail.

  2. B2 · comparison· you cover

    “Salesforce vs HubSpot vs Pipedrive for SMB”

    tests · the shortlist — who the real alternatives are

    What gets you cited on this branch

    An honest, sourced head-to-head — wins credibility.

  3. B3 · sub-intent· you cover

    “CRM with native email and LinkedIn integration”

    tests · workflow fit — does it slot into the stack

    What gets you cited on this branch

    An integrations page that names the tools — wins topic match.

  4. B4 · risk· you absent

    “is [CRM] reliable — uptime and reviews”

    tests · the trust question — will it hold up

    What gets you cited on this branch

    A status + third-party reviews page — wins popularity.

  5. B5 · drill-down· you cover

    “how long does it take to set up a CRM”

    tests · effort — time-to-value before committing

    What gets you cited on this branch

    A dated setup-time guide — wins freshness.

  6. B6 · head query· you cover

    “best CRM for a 20-person team 2026”

    tests · the literal ask the user typed

    What gets you cited on this branch

    Your main page — competes on all five rerank signals.

5/6steps covered

A page that answers only the head query competes for one branch of six. A page — or a tightly interlinked cluster — that covers all six is eligible across the whole fan. But eligibility is not citation: on each branch you still have to beat the field on the rerank signal in the right-hand column. Cover the fan to get into the pools; win topic match, depth, credibility, freshness, and popularity to get printed.

Branches + coverage are illustrative of the documented fan-out mechanism · rerank signals per Bing Deep Search

That is the entire model, run on one question. Breadth makes you eligible across the fan; rerank survival on each branch is what actually prints your URL.

What does this change about the work?

Three shifts follow directly from the evidence.

Map the fan before you write. For any target topic, enumerate the subtopics, comparisons, constraints, and drill-downs the engine will generate — those are your real ranking surfaces. Cover them on self-contained, retrievable pages, across more than one source, so single-index presence does not cap how many branches can surface you.

Win the rerank, not just the retrieval. Being pulled into the candidate pool is the start line, not the finish. Deepen topical detail, harden credibility with clear authorship and primary citations, keep a visible freshness cadence, and write passages an engine can lift verbatim with attribution. These are precisely the signals Bing names as surviving truncation.

Treat coverage as a recurring audit. The fan is widening. In November 2025 Google said Gemini 3 gives query fan-out “a major upgrade,” performing “even more searches” and surfacing content it may previously have missed.[15] And the tree has depth: Google's “Thematic search” patent describes clustering results into themes and spawning a refined sub-query — “moving to Denver” plus “neighborhoods” — recursively producing sub-themes.[14] Being cited at depth one can spawn the very sub-query that re-retrieves you at depth two. What was below the cut last year can clear it after the next model update, so coverage is never finished.

Where the evidence runs out

A serious reading marks its own edges. Google's and Microsoft's product posts name “query fan-out” and describe its scale, but they do not disclose the decomposition prompts, the real query counts per question, or the rerank weighting — those remain proprietary. The patents describe claimed methods, not confirmed production paths; the stateful-chat filing even predates the public fan-out era, and a filing is not a deployment. The academic decomposition and retrieval papers establish provenance and plausibility, not first-party adoption — they are convergent evidence, not proof of what ships. The Hit@10 figures come from one industry deployment of undisclosed affiliation, not from Google or Bing, and the five rerank signals are explicitly Bing's; Google has published no equivalent. And Google's Deep Search and Bing's Deep Search are different products from different companies that must not be blended.

What survives all of that is the shape of the thing. Engines decompose, generate sub-queries, retrieve in parallel, fuse, rerank, and truncate before they cite. Coverage of the fan determines eligibility; rerank survival determines citation. Map the questions behind the question, cover them across sources, and make each page credible, deep, fresh, and cheap to quote. That work compounds no matter which engine widens its fan next — and it pairs with the downstream half of the problem, how AI search chooses what to cite once your page is in the pool, and how to measure the visibility you earn.

— Sundar Ramesh Kumar · martech.llc

Run it on your own page · free

Don’t take our word for it — measure it.

Paste any URL. The SEO · AEO · GEO citability score is free — the paste-ready fix for every gap lives in the full Citerra report.

instant score · no signup to see your number

Frequently asked questions

What is query fan-out in AI search?
Query fan-out is the technique an AI search engine uses to turn one question into many. Google describes AI Mode as issuing multiple related searches concurrently across subtopics and multiple data sources, then bringing the results together into one response. Instead of ranking your page against a single query, the engine ranks you across a hidden tree of synthetic sub-queries you never see.
Does ranking number one for a keyword get you cited in AI Mode?
Not on its own. The engine decomposes the question into subtopics and runs many searches in parallel, so a page that answers only the literal head query competes for one branch of the fan while a page that covers the whole intent is eligible across many. A high blue-link rank is one input among many, not a guarantee of citation.
How many searches does AI Mode run for a single question?
Google says AI Mode can perform about a dozen searches in the time it takes to do one, and that its Deep Search mode takes the same query fan-out technique further, issuing hundreds of searches to produce a fully cited report. Microsoft's Bing Deep Search reviews roughly ten times the pages of a normal search and can take up to thirty seconds.
Does covering more sub-queries guarantee more AI citations?
No — and this is the part most playbooks miss. A 2026 production study found that multi-query retrieval with reciprocal rank fusion raised raw recall, but the gains were largely neutralized after re-ranking and truncation, with Hit@10 falling from 0.51 to 0.48 in several configurations. Being retrieved on more branches does not mean being cited; the rerank-and-truncate gate decides survival.
What is the Fan-Out Surface Area model?
It is a way to see AI-search competition as coverage of a hidden tree rather than rank on a keyword. Surface area is the share of the sub-query fan your content can satisfy. More surface area gets you into more candidate pools, but the binding constraint is per-branch rerank survival — so breadth is necessary but not sufficient.
What signals decide which sources survive the rerank?
Microsoft's Bing names them directly for Deep Search: how well the topic matches, whether the content is at the appropriate level of detail, how credible and trustworthy the source is, and how fresh and popular it is. Google has not published its own rerank weighting, so these five Bing signals are the clearest public scorecard for what survives truncation.
Is query fan-out a Google-only thing?
No. Google named the term for AI Mode, but Microsoft's Bing Deep Search independently describes having GPT-4 enumerate query intents, rewrite the query into multiple variations, search all of them, and rerank the union. The underlying pattern — decompose, generate sub-queries, retrieve in parallel, fuse and rerank — also appears across peer-reviewed retrieval research.
How should content strategy change for query fan-out?
Stop optimizing only for the visible head query and start covering the latent sub-questions a topic generates, across more than one source. Then make each page survive the rerank: deepen topical detail, harden credibility with clear authorship and primary citations, keep content fresh, and write self-contained passages an engine can lift with attribution. Re-audit as the fan widens with each model upgrade.
Filed underresearch note#generative-engine-optimization#answer-engine-optimization#query-fan-out#ai-search#retrieval

Sources · 15

Every claim, dated and linked
  1. [1]

    Google describes AI Mode using a 'query fan-out' technique, issuing multiple related searches concurrently across subtopics and multiple data sources, then bringing the results together into one response.

    Google — Expanding AI Overviews and introducing AI Mode (Robby Stein)2025-03-05

  2. [2]

    Google states that Deep Search uses the same query fan-out technique taken to the next level, issuing hundreds of searches and creating an expert-level, fully cited report in minutes.

    Google — AI in Search: going beyond information to intelligence (Elizabeth Reid)2025-05-20

  3. [3]

    Google describes AI Mode performing a fan-out that triggers about a dozen searches in the time it takes to do one, extended to multi-object visual queries.

    Google — Ask a Techspert: how does AI understand my visual searches?2026-03-05

  4. [4]

    Google's 'Search with stateful chat' patent describes a first large language model generating supplemental, rewritten, alternative, and 'drill down' queries that are executed alongside the user's original query, with responsive documents merged into one set.

    Google LLC — Search with stateful chat (US20240289407A1)2024-08-29

  5. [5]

    A Google patent describes few-shot prompting an LLM to generate diverse synthetic queries per document — for example eight questions per document at sampling temperature 0.7 — with round-trip filtering that discards a query unless it retrieves its own source document.

    Google LLC — Prompt-based query generation for diverse retrieval (WO2024064249A1)2024-03-28

  6. [6]

    A Google patent describes generating machine-made synthetic queries from anchor text used by a threshold of referring pages and pre-associating them with documents, independent of real-time relevance ranking.

    Google LLC — Query augmentation (US9916366B1)2018-03-13

  7. [7]

    The Self-Ask method has a model explicitly generate and answer decomposed follow-up sub-questions before the final answer, and the compositionality gap does not shrink as model size grows.

    Press et al. — Measuring and Narrowing the Compositionality Gap (Self-Ask)2022-10-07

  8. [8]

    Least-to-most prompting breaks a complex problem into a series of simpler subproblems solved in sequence, each aided by the answers to previous ones.

    Zhou et al. — Least-to-Most Prompting (Google Research, ICLR 2023)2022-05-21

  9. [9]

    Generation-Augmented Retrieval shows that generating diverse contexts for a query and fusing their results consistently improves retrieval accuracy.

    Mao et al. — Generation-Augmented Retrieval for Open-domain QA (ACL 2021)2020-09-17

  10. [10]

    RAG-Fusion combines retrieval-augmented generation with reciprocal rank fusion by generating multiple query variations, retrieving for each, and fusing the ranked lists.

    Rackauckas — RAG-Fusion: a New Take on Retrieval-Augmented Generation2024-01-31

  11. [11]

    Retrieval-Augmented Generation introduced the retrieve-then-generate architecture where one query retrieves top-K passages from a dense index and the generator marginalizes over them to produce more specific, factual language.

    Lewis et al. — Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks2020-05-22

  12. [12]

    A 2026 production study found that retrieval fusion increases raw recall but the gains are largely neutralized after re-ranking and truncation, with Hit@10 decreasing from 0.51 to 0.48 in several configurations.

    Medrano et al. — Scaling RAG with RAG Fusion: Lessons from an Industry Deployment2026-03-02

  13. [13]

    Microsoft's Bing Deep Search rewrites the user's query into multiple variations, reviews about ten times the pages of a normal search, and reranks by how well the topic matches, the level of detail, source credibility and trustworthiness, freshness, and popularity.

    Microsoft — Introducing Deep Search (Bing Search Quality Insights)2023-12-05

  14. [14]

    Google's 'Thematic search' patent describes clustering the result set into themes and spawning a new refined sub-query — for example 'moving to Denver' plus 'neighborhoods' — recursively producing sub-themes.

    Google LLC — Thematic search (US12158907B1)2024-12-03

  15. [15]

    Google states that Gemini 3 gives its existing query fan-out technique a major upgrade, performing even more searches and surfacing content it may previously have missed.

    Google — Search with Gemini 3: our most intelligent search yet (Elizabeth Reid)2025-11-18

Up next

Related from the desk
Run it on your own page · free

Don’t take our word for it — measure it.

Paste any URL. The SEO · AEO · GEO citability score is free — the paste-ready fix for every gap lives in the full Citerra report.

instant score · no signup to see your number

Machine-readable mirror · /research/query-fan-out-surface-area/raw.md