---
title: "The fan-out tree you never see — why AI search ranks coverage, then kills it at the rerank"
url: https://martech.llc/research/query-fan-out-surface-area
publishedAt: 2026-06-02
updatedAt: 2026-06-02
author: sundar
category: research-note
summary: "AI Mode doesn't search your question — it decomposes it into a hidden tree of sub-queries, runs them in parallel, then fuses, reranks, and truncates before citing anyone. Drawing on Google's and Microsoft's disclosures, their patents, and a production RAG study, this maps the Fan-Out Surface Area model."
soWhat: "Cover the fan to get retrieved; survive the rerank to get cited. Breadth opens the door — credibility, depth, and freshness walk through it."
tags: ["generative-engine-optimization","answer-engine-optimization","query-fan-out","ai-search","retrieval"]
keywords: ["query fan-out","what is query fan-out","ai mode query fan-out","generative engine optimization","answer engine optimization","ai overviews retrieval","multi-query retrieval","rag fusion reranking"]
claims: [{"id":"claim-1","text":"Google describes AI Mode using a 'query fan-out' technique, issuing multiple related searches concurrently across subtopics and multiple data sources, then bringing the results together into one response.","source":"https://blog.google/products/search/ai-mode-search/","sourceTitle":"Google — Expanding AI Overviews and introducing AI Mode (Robby Stein)","sourceDate":"2025-03-05"},{"id":"claim-2","text":"Google states that Deep Search uses the same query fan-out technique taken to the next level, issuing hundreds of searches and creating an expert-level, fully cited report in minutes.","source":"https://blog.google/products-and-platforms/products/search/google-search-ai-mode-update/","sourceTitle":"Google — AI in Search: going beyond information to intelligence (Elizabeth Reid)","sourceDate":"2025-05-20"},{"id":"claim-3","text":"Google describes AI Mode performing a fan-out that triggers about a dozen searches in the time it takes to do one, extended to multi-object visual queries.","source":"https://blog.google/company-news/inside-google/googlers/how-google-ai-visual-search-works/","sourceTitle":"Google — Ask a Techspert: how does AI understand my visual searches?","sourceDate":"2026-03-05"},{"id":"claim-4","text":"Google's 'Search with stateful chat' patent describes a first large language model generating supplemental, rewritten, alternative, and 'drill down' queries that are executed alongside the user's original query, with responsive documents merged into one set.","source":"https://patents.google.com/patent/US20240289407A1/en","sourceTitle":"Google LLC — Search with stateful chat (US20240289407A1)","sourceDate":"2024-08-29"},{"id":"claim-5","text":"A Google patent describes few-shot prompting an LLM to generate diverse synthetic queries per document — for example eight questions per document at sampling temperature 0.7 — with round-trip filtering that discards a query unless it retrieves its own source document.","source":"https://patents.google.com/patent/WO2024064249A1/en","sourceTitle":"Google LLC — Prompt-based query generation for diverse retrieval (WO2024064249A1)","sourceDate":"2024-03-28"},{"id":"claim-6","text":"A Google patent describes generating machine-made synthetic queries from anchor text used by a threshold of referring pages and pre-associating them with documents, independent of real-time relevance ranking.","source":"https://patents.google.com/patent/US9916366B1/en","sourceTitle":"Google LLC — Query augmentation (US9916366B1)","sourceDate":"2018-03-13"},{"id":"claim-7","text":"The Self-Ask method has a model explicitly generate and answer decomposed follow-up sub-questions before the final answer, and the compositionality gap does not shrink as model size grows.","source":"https://arxiv.org/abs/2210.03350","sourceTitle":"Press et al. — Measuring and Narrowing the Compositionality Gap (Self-Ask)","sourceDate":"2022-10-07"},{"id":"claim-8","text":"Least-to-most prompting breaks a complex problem into a series of simpler subproblems solved in sequence, each aided by the answers to previous ones.","source":"https://arxiv.org/abs/2205.10625","sourceTitle":"Zhou et al. — Least-to-Most Prompting (Google Research, ICLR 2023)","sourceDate":"2022-05-21"},{"id":"claim-9","text":"Generation-Augmented Retrieval shows that generating diverse contexts for a query and fusing their results consistently improves retrieval accuracy.","source":"https://arxiv.org/abs/2009.08553","sourceTitle":"Mao et al. — Generation-Augmented Retrieval for Open-domain QA (ACL 2021)","sourceDate":"2020-09-17"},{"id":"claim-10","text":"RAG-Fusion combines retrieval-augmented generation with reciprocal rank fusion by generating multiple query variations, retrieving for each, and fusing the ranked lists.","source":"https://arxiv.org/abs/2402.03367","sourceTitle":"Rackauckas — RAG-Fusion: a New Take on Retrieval-Augmented Generation","sourceDate":"2024-01-31"},{"id":"claim-11","text":"Retrieval-Augmented Generation introduced the retrieve-then-generate architecture where one query retrieves top-K passages from a dense index and the generator marginalizes over them to produce more specific, factual language.","source":"https://arxiv.org/abs/2005.11401","sourceTitle":"Lewis et al. — Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks","sourceDate":"2020-05-22"},{"id":"claim-12","text":"A 2026 production study found that retrieval fusion increases raw recall but the gains are largely neutralized after re-ranking and truncation, with Hit@10 decreasing from 0.51 to 0.48 in several configurations.","source":"https://arxiv.org/abs/2603.02153","sourceTitle":"Medrano et al. — Scaling RAG with RAG Fusion: Lessons from an Industry Deployment","sourceDate":"2026-03-02"},{"id":"claim-13","text":"Microsoft's Bing Deep Search rewrites the user's query into multiple variations, reviews about ten times the pages of a normal search, and reranks by how well the topic matches, the level of detail, source credibility and trustworthiness, freshness, and popularity.","source":"https://blogs.bing.com/search-quality-insights/december-2023/Introducing-Deep-Search","sourceTitle":"Microsoft — Introducing Deep Search (Bing Search Quality Insights)","sourceDate":"2023-12-05"},{"id":"claim-14","text":"Google's 'Thematic search' patent describes clustering the result set into themes and spawning a new refined sub-query — for example 'moving to Denver' plus 'neighborhoods' — recursively producing sub-themes.","source":"https://patents.google.com/patent/US12158907B1/en","sourceTitle":"Google LLC — Thematic search (US12158907B1)","sourceDate":"2024-12-03"},{"id":"claim-15","text":"Google states that Gemini 3 gives its existing query fan-out technique a major upgrade, performing even more searches and surfacing content it may previously have missed.","source":"https://blog.google/products-and-platforms/products/search/gemini-3-search-ai-mode/","sourceTitle":"Google — Search with Gemini 3: our most intelligent search yet (Elizabeth Reid)","sourceDate":"2025-11-18"}]
---

# The fan-out tree you never see — why AI search ranks coverage, then kills it at the rerank

When you ask an AI search engine a question, it does not search your question. It quietly breaks the question into subtopics, writes a fan of related searches, runs them in parallel across several sources, then merges, reranks, and truncates the results before a single sentence of the answer is written. You are not ranking for a query. You are ranking for a hidden tree of sub-queries you never see — and the tree has a trapdoor.

That trapdoor is the whole point of this piece, and it is where most generative-engine playbooks go wrong. They tell you to cover more topics so you get retrieved more often. That advice is half-right and dangerously incomplete: the engine *does* fan out and retrieve broadly, but it then collapses that breadth at a rerank-and-truncate step where most retrieved pages are discarded. Coverage gets you into the room; it does not get you quoted. This piece traces the mechanism end to end using the primary sources that describe it — Google's and Microsoft's own disclosures, their patents, and the peer-reviewed retrieval literature — and closes with a single model, the Fan-Out Surface Area, plus an honest account of what the evidence does *not* prove.

<Aside kind="fact" title="The short version">
One question becomes many. The engine decomposes your query into subtopics, generates synthetic search strings for each, retrieves candidates in parallel, fuses the lists, reranks, and truncates — then cites whatever survived. Surface area (how much of the fan you cover) decides whether you are *eligible*. Rerank survival decides whether you are *cited*. Optimize for both, in that order.
</Aside>

## How does one question become a fan of searches?

It does not run your query. It runs a tree of them.

<Claim id="claim-1">In March 2025, Google described AI Mode as using a [&ldquo;query fan-out&rdquo; technique](https://blog.google/products/search/ai-mode-search/), &ldquo;issuing multiple related searches concurrently across subtopics and multiple data sources and then bring[ing] those results together to provide an easy-to-understand response.&rdquo;</Claim> That is the first-party origin of the term — not analyst speculation, but Google naming its own mechanism. And the scale is not subtle.

> The AI performs multi-object reasoning to understand what you're looking at. Then it uses a 'fan-out' technique which triggers multiple searches at once, reads through the results and presents a single, cohesive response.

<Claim id="claim-3">That description — from a Google engineering explainer — puts the fan at [&ldquo;about a dozen searches in the time it takes to do one,&rdquo;](https://blog.google/company-news/inside-google/googlers/how-google-ai-visual-search-works/) and extends it to images, where every object in a photo is searched at once.</Claim> For research-grade questions the tree is far larger: <Claim id="claim-2">Google says [Deep Search](https://blog.google/products-and-platforms/products/search/google-search-ai-mode-update/) &ldquo;uses the same query fan-out technique but taken to the next level&hellip; it can issue hundreds of searches&hellip; and create an expert-level fully-cited report in just minutes.&rdquo;</Claim>

<FanOutTree />

None of this is improvised at runtime alone. The mechanism is documented in Google's patents. <Claim id="claim-4">The [&ldquo;Search with stateful chat&rdquo; filing](https://patents.google.com/patent/US20240289407A1/en) describes a first large language model generating &ldquo;alternative query suggestions, supplemental queries, rewritten versions of the user&rsquo;s query, and/or &lsquo;drill down&rsquo; queries,&rdquo; submitting both the original and the generated queries to search, and merging the documents responsive to all of them into a single candidate set.</Claim> One input, many executions, one pooled result. That is query fan-out in the most literal terms a legal document will allow.

The strategic consequence is immediate. A page that answers the literal question but none of the adjacent ones competes for a single branch of the fan. A page that covers the comparison, the price, the reliability question, the integration, and the next-level refinement shows up across many branches and accumulates chances to be retrieved. You are no longer optimizing for a keyword. You are optimizing for coverage of an intent.

## Where do the synthetic queries actually come from?

If the engine writes the searches, the question becomes: what does it write, and can you anticipate it?

The research lineage is older than AI Mode. <Claim id="claim-8">Google Research's [least-to-most prompting](https://arxiv.org/abs/2205.10625) established the core move — &ldquo;break down a complex problem into a series of simpler subproblems and then solve them in sequence&rdquo; — reaching at least 99% accuracy on a compositional benchmark where chain-of-thought managed 16%.</Claim> <Claim id="claim-7">The [Self-Ask method](https://arxiv.org/abs/2210.03350) made it retrieval-shaped: the model explicitly generates and answers decomposed follow-up sub-questions, and — critically — the paper showed the &ldquo;compositionality gap&rdquo; does *not* shrink as models grow.</Claim> The comfortable assumption that a smarter model will simply understand your one page is empirically false; decomposition is durable.

On the generation side, the engines manufacture queries from material you can influence. <Claim id="claim-5">One Google patent describes few-shot prompting an LLM to generate diverse synthetic queries per document — [&ldquo;8 questions per document&hellip; with a temperature parameter at 0.7&rdquo;](https://patents.google.com/patent/WO2024064249A1/en) — and then applying round-trip filtering, discarding any generated query that fails to retrieve its own source document.</Claim> Read that filter again: the system favors content that *unambiguously* answers the query it would itself generate. Vague, hedged passages get filtered out before they ever compete.

And some of the synthetic queries are written by other people. <Claim id="claim-6">A separate Google patent describes minting synthetic queries from [the anchor text used by a threshold of referring pages](https://patents.google.com/patent/US9916366B1/en) and pre-associating them with documents, independent of real-time ranking.</Claim> The language others use to link to you literally becomes queries you can be matched on — anchor text is a retrieval input, not just an authority signal. This is not a 2024 invention either; that filing carries a 2008 priority date. The LLM era industrialized a mechanic that has been quietly running for over a decade.

Microsoft makes the reformulation step unusually explicit. <Claim id="claim-13">Bing's [Deep Search disclosure](https://blogs.bing.com/search-quality-insights/december-2023/Introducing-Deep-Search) describes rewriting one query into many — for &ldquo;how do points systems work in Japan,&rdquo; it generates variations like &ldquo;loyalty card programs Japan&rdquo; and &ldquo;best loyalty cards for travelers in Japan&rdquo; — reviewing about ten times the pages of a normal search.</Claim> If you want to see the fan made visible, that post is the closest any major engine has come to showing its work.

Put the four together and the fan stops being mysterious — it is manufactured by four machines you can anticipate:

<FanQueryGeneration />

## Why isn't more coverage automatically more citations?

Here is where the prevailing GEO narrative breaks, and where the evidence gets genuinely interesting.

The retrieval research is clear that fan-out helps recall. <Claim id="claim-9">[Generation-Augmented Retrieval](https://arxiv.org/abs/2009.08553) showed that &ldquo;generating diverse contexts for a query is beneficial as fusing their results consistently yields better retrieval accuracy.&rdquo;</Claim> <Claim id="claim-11">That sits on top of the [retrieve-then-generate architecture](https://arxiv.org/abs/2005.11401) that founded modern RAG: one query pulls top-K passages from a dense index, and the generator marginalizes over them to write &ldquo;more specific, diverse and factual language.&rdquo;</Claim> <Claim id="claim-10">[RAG-Fusion](https://arxiv.org/abs/2402.03367) formalized the multi-query version — &ldquo;generating multiple queries, reranking them with reciprocal scores and fusing the documents&rdquo; — but its own authors flagged that the extra queries can make answers *off-topic* when the generated queries drift from intent.</Claim>

That caveat turns out to be the headline. <Claim id="claim-12">A 2026 study of a production deployment found that retrieval fusion &ldquo;does increase raw recall, but these gains are largely neutralized after [re-ranking and truncation](https://arxiv.org/abs/2603.02153),&rdquo; with Hit@10 falling from 0.51 to 0.48 across several configurations — and adding latency from query rewriting and larger candidate sets.</Claim> Pulling more candidates into the pool did not produce more correct answers once realistic rerank budgets and context limits were applied. The funnel collapses at the cut.

<FanOutFunnel />

So what survives the cut? Bing, uniquely, names the signals. <Claim id="claim-13">Deep Search reranks by [&ldquo;how well the topic matches, whether it&rsquo;s at the appropriate level of detail, how credible and trustworthy the source is, how fresh and popular it is, and so on.&rdquo;](https://blogs.bing.com/search-quality-insights/december-2023/Introducing-Deep-Search)</Claim> That sentence is the most actionable line in the entire literature: it is a public rerank scorecard. Google has not published its own weighting, so treat these five — topic match, depth, credibility, freshness, popularity — as the best available proxy for what walks through the trapdoor.

<Pullquote>Surface area gets you into the candidate pool. Only rerank survival on the branches you already win gets your URL printed in the answer. Breadth is necessary; it is not sufficient.</Pullquote>

## The Fan-Out Surface Area model

Put the sources together and one model falls out. Your real ranking surface is not a keyword position; it is the share of the hidden sub-query tree your content can satisfy — and the rerank gate that decides which of those branches actually cite you.

<FanOutSurfaceArea />

The name is deliberately a warning. Maximizing surface area feels like progress — more topics, more pages, more branches — but the production evidence says raw breadth is neutralized at truncation. Two pages can both rank first for the head term; only the one that also survives the rerank on its branches gets cited. The mechanics resolve into five stages, each with exactly one lever an operator controls.

<FanOutLadder />

Most teams work the third stage — be indexed, be fast — and treat the rest as weather. The fan-out engine decides citations at the first, second, and final stages, where almost no one is working. Build backwards from the citation: cover the fan so you are eligible, then make each branch you win survive the rerank.

### A worked example: mapping one query's fan

Make the hidden tree concrete. Take a real buying question — *&ldquo;best CRM for a 20-person B2B sales team&rdquo;* — and watch what the engine actually does with it. It does not rank you on that string. It decomposes the intent into subtopics, writes a fan of concrete sub-queries, and retrieves a separate candidate set for each. The walkthrough below runs the whole model on that one query: the six branches it writes, what each tests, and the rerank signal your page has to win to be cited on it.

<FanOutWorkedExample />

That is the entire model, run on one question. Breadth makes you eligible across the fan; rerank survival on each branch is what actually prints your URL.

## What does this change about the work?

Three shifts follow directly from the evidence.

**Map the fan before you write.** For any target topic, enumerate the subtopics, comparisons, constraints, and drill-downs the engine will generate — those are your real ranking surfaces. Cover them on self-contained, retrievable pages, across more than one source, so single-index presence does not cap how many branches can surface you.

**Win the rerank, not just the retrieval.** Being pulled into the candidate pool is the start line, not the finish. Deepen topical detail, harden credibility with clear authorship and primary citations, keep a visible freshness cadence, and write passages an engine can lift verbatim with attribution. These are precisely the signals Bing names as surviving truncation.

**Treat coverage as a recurring audit.** The fan is widening. <Claim id="claim-15">In November 2025 Google said [Gemini 3 gives query fan-out &ldquo;a major upgrade,&rdquo;](https://blog.google/products-and-platforms/products/search/gemini-3-search-ai-mode/) performing &ldquo;even more searches&rdquo; and surfacing content it may previously have missed.</Claim> And the tree has depth: <Claim id="claim-14">Google's [&ldquo;Thematic search&rdquo; patent](https://patents.google.com/patent/US12158907B1/en) describes clustering results into themes and spawning a refined sub-query — &ldquo;moving to Denver&rdquo; plus &ldquo;neighborhoods&rdquo; — recursively producing sub-themes.</Claim> Being cited at depth one can spawn the very sub-query that re-retrieves you at depth two. What was below the cut last year can clear it after the next model update, so coverage is never finished.

## Where the evidence runs out

A serious reading marks its own edges. Google's and Microsoft's product posts name &ldquo;query fan-out&rdquo; and describe its scale, but they do not disclose the decomposition prompts, the real query counts per question, or the rerank weighting — those remain proprietary. The patents describe claimed methods, not confirmed production paths; the stateful-chat filing even predates the public fan-out era, and a filing is not a deployment. The academic decomposition and retrieval papers establish provenance and plausibility, not first-party adoption — they are convergent evidence, not proof of what ships. The Hit@10 figures come from one industry deployment of undisclosed affiliation, not from Google or Bing, and the five rerank signals are explicitly *Bing's*; Google has published no equivalent. And Google's Deep Search and Bing's Deep Search are different products from different companies that must not be blended.

What survives all of that is the shape of the thing. Engines decompose, generate sub-queries, retrieve in parallel, fuse, rerank, and truncate before they cite. Coverage of the fan determines eligibility; rerank survival determines citation. Map the questions behind the question, cover them across sources, and make each page credible, deep, fresh, and cheap to quote. That work compounds no matter which engine widens its fan next — and it pairs with the downstream half of the problem, [how AI search chooses what to cite](/research/how-ai-search-chooses-citations) once your page is in the pool, and [how to measure the visibility you earn](/research/how-to-measure-ai-search-visibility).

— Sundar Ramesh Kumar · martech.llc

<div id="run-free" className="scroll-mt-24">
  <InlineToolRunner defaultTab="citerra" />
</div>