---
title: "How AI search resolves your brand to an entity — and the five layers that decide"
url: https://martech.llc/research/entity-authority-stack
publishedAt: 2026-06-06
updatedAt: 2026-06-06
author: sundar
category: research-note
summary: "Before an AI engine cites your brand, it resolves it to an entity — a 'thing' in a knowledge graph, not your keywords. Drawing on entity-resolution patents, entity-linking research, and Google's Knowledge Graph, this maps the five layers of entity authority."
soWhat: "AI engines cite entities, not pages — so the work is to become a resolvable, well-attributed, salient thing in the graph, not just a high-ranking string."
tags: ["entity-seo","knowledge-graph","generative-engine-optimization","ai-search","brand-entity"]
keywords: ["entity authority","entity seo","knowledge graph optimization","how ai search understands brands","brand entity ai search","entity resolution seo","how to be cited by ai","ai brand visibility"]
claims: [{"id":"claim-1","text":"Google's Knowledge Graph launched in May 2012 as an intelligent model that understands real-world entities and their relationships — 'things, not strings' — and was stated to contain more than 500 million objects and more than 3.5 billion facts.","source":"https://blog.google/products-and-platforms/products/search/introducing-knowledge-graph-things-not/","sourceTitle":"Google — Introducing the Knowledge Graph: things, not strings","sourceDate":"2012-05-16"},{"id":"claim-2","text":"Microsoft's 'Knowledge-based entity detection and disambiguation' patent (US9665643B2) describes associating entity identifiers with a web page and storing them as metadata in a search engine index, enabling entity-based queries and re-ranking of results by entity rather than keyword.","source":"https://patents.google.com/patent/US9665643B2/en","sourceTitle":"Microsoft — Knowledge-based entity detection and disambiguation (US9665643B2)","sourceDate":"2017-05-30"},{"id":"claim-3","text":"The WildHallucinations benchmark found that about half of 7,919 real-world entities had no associated Wikipedia page, and that LLMs consistently hallucinate more on entities without Wikipedia pages, across 118,785 generations from 15 LLMs.","source":"https://arxiv.org/abs/2407.17468","sourceTitle":"Zhao et al. — WildHallucinations: Evaluating Long-form Factuality with Real-World Entity Queries","sourceDate":"2024-07-24"},{"id":"claim-4","text":"The PopQA study found that language models struggle with less popular factual knowledge and that scaling fails to appreciably improve memorization of factual knowledge in the long tail, while retrieval augmentation helps significantly for low-popularity entities.","source":"https://aclanthology.org/2023.acl-long.546/","sourceTitle":"Mallen et al. — When Not to Trust Language Models (PopQA), ACL 2023","sourceDate":"2023-07-01"},{"id":"claim-5","text":"The Head-to-Tail benchmark of 18,000 QA pairs evaluated across 16 LLMs found that existing LLMs remain far from perfect on factual knowledge, especially for facts about torso-to-tail entities.","source":"https://arxiv.org/abs/2308.10168","sourceTitle":"Sun et al. — Head-to-Tail: How Knowledgeable are LLMs?, NAACL 2024","sourceDate":"2023-08-20"},{"id":"claim-6","text":"The LAMA probe demonstrated that pretrained language models store relational knowledge that can be recovered with fill-in-the-blank cloze statements without fine-tuning.","source":"https://aclanthology.org/D19-1250/","sourceTitle":"Petroni et al. — Language Models as Knowledge Bases?, EMNLP-IJCNLP 2019","sourceDate":"2019-11-03"},{"id":"claim-7","text":"Google's 'Additive context model for entity resolution' patent (US9697475B1) describes resolving an ambiguous mention to a knowledge-base entity by building a vector of context features from the surrounding document and selecting the highest-scoring candidate entity.","source":"https://patents.google.com/patent/US9697475B1/en","sourceTitle":"Google — Additive context model for entity resolution (US9697475B1)","sourceDate":"2017-07-04"},{"id":"claim-8","text":"The BLINK system performs zero-shot entity linking by defining each entity only through a short textual description, using a bi-encoder to retrieve candidates in a dense space and a cross-encoder to re-rank them.","source":"https://arxiv.org/abs/1911.03814","sourceTitle":"Wu et al. — Scalable Zero-shot Entity Linking with Dense Entity Retrieval (BLINK)","sourceDate":"2019-11-10"},{"id":"claim-9","text":"Microsoft's knowledge-graph query-expansion patent application (US20150095319A1) describes expansion segments including alias, disambiguation, filter, and ranking, and states that a disambiguation segment may be unnecessary when an entity is famous or unique.","source":"https://patents.google.com/patent/US20150095319A1/en","sourceTitle":"Microsoft — Query Expansion, Filtering and Ranking Utilizing Knowledge Graphs (US20150095319A1)","sourceDate":"2015-04-02"},{"id":"claim-10","text":"Google's Organization structured-data documentation states that some properties are used behind the scenes to disambiguate your organization from others, while others can influence visual elements such as the logo shown in Search results and the knowledge panel.","source":"https://developers.google.com/search/docs/appearance/structured-data/organization","sourceTitle":"Google Search Central — Organization structured data","sourceDate":"2025-01-01"},{"id":"claim-11","text":"Google states the Knowledge Graph is its database of billions of facts about people, places, and things, and that content owners can suggest changes to knowledge panels they have claimed.","source":"https://support.google.com/knowledgepanel/answer/9787176?hl=en","sourceTitle":"Google — How Google's Knowledge Graph works (Knowledge Panel Help)","sourceDate":"2024-01-01"},{"id":"claim-12","text":"The ReFinED model performs mention detection, fine-grained entity typing, and entity disambiguation for all mentions in a document in a single forward pass and generalises to Wikidata, which has roughly 15 times more entities than Wikipedia.","source":"https://arxiv.org/abs/2207.04108","sourceTitle":"Ayoola et al. — ReFinED: Efficient Zero-shot-capable End-to-End Entity Linking, NAACL 2022","sourceDate":"2022-07-08"},{"id":"claim-13","text":"The TransE method models relationships in a knowledge base as translations operating on low-dimensional entity embeddings and significantly outperformed prior state-of-the-art on link prediction across two knowledge bases.","source":"https://proceedings.neurips.cc/paper/2013/hash/1cecc7a77928ca8133fa24680a88d2f9-Abstract.html","sourceTitle":"Bordes et al. — Translating Embeddings for Modeling Multi-relational Data (TransE), NeurIPS 2013","sourceDate":"2013-12-05"},{"id":"claim-14","text":"Microsoft Research's GraphRAG method uses an LLM to derive an entity knowledge graph from source documents and then pre-generates community summaries for all groups of closely related entities.","source":"https://arxiv.org/abs/2404.16130","sourceTitle":"Edge et al. — From Local to Global: A Graph RAG Approach to Query-Focused Summarization","sourceDate":"2024-04-24"},{"id":"claim-15","text":"An Ahrefs study of 75,000 brands using Spearman correlation reported that YouTube mentions showed the strongest correlation with AI visibility, around 0.737, outperforming every other factor across ChatGPT, AI Mode, and AI Overviews.","source":"https://ahrefs.com/blog/ai-brand-visibility-correlations/","sourceTitle":"Ahrefs — Top Brand Visibility Factors in ChatGPT, AI Mode, and AI Overviews (75k Brands)","sourceDate":"2025-12-12"},{"id":"claim-16","text":"A peer-reviewed generative-engine study found that content-level changes such as adding quotations, statistics, and source citations can boost a source's visibility in generative-engine responses by up to roughly 40%.","source":"https://arxiv.org/abs/2311.09735","sourceTitle":"Aggarwal et al. — GEO: Generative Engine Optimization, KDD 2024","sourceDate":"2023-11-16"},{"id":"claim-17","text":"Microsoft's 'Identifying salient items in documents' patent (US9251473B2) describes scoring an item's salience to a web page using a soft function derived from the ratio of page visits whose queries include the item to the total visits to that page.","source":"https://patents.google.com/patent/US9251473B2/en","sourceTitle":"Microsoft — Identifying salient items in documents (US9251473B2)","sourceDate":"2016-02-02"},{"id":"claim-18","text":"Google's 'Scoring local search results based on location prominence' patent (US8046371B2) describes a prominence score computed from factors including the number of documents referring to a business and the number of information documents that mention the business.","source":"https://patents.google.com/patent/US8046371B2/en","sourceTitle":"Google — Scoring local search results based on location prominence (US8046371B2)","sourceDate":"2011-10-25"},{"id":"claim-19","text":"A Yext study of 6.8 million AI citations across ChatGPT, Gemini, and Perplexity reported that 86% of citations came from sources brands already control, with first-party websites generating 2.9 million citations (44%) and listings 2.9 million (42%).","source":"https://investors.yext.com/news-events/press-releases/detail/376/yext-research-86-of-ai-citations-come-from-brand-managed","sourceTitle":"Yext — 86% of AI Citations Come from Brand-Managed Sources","sourceDate":"2025-10-09"},{"id":"claim-20","text":"Google's 'Ranking search results based on entity metrics' patent (US10235423B2) describes ranking results by combining knowledge-graph-derived entity metrics — relatedness, notable entity type, contribution, and prize — with weights that depend on the type of entity.","source":"https://patents.google.com/patent/US10235423B2/en","sourceTitle":"Google — Ranking search results based on entity metrics (US10235423B2)","sourceDate":"2019-03-19"}]
---

# How AI search resolves your brand to an entity — and the five layers that decide

AI search rarely reads your page as a bag of keywords. Before it cites anything, it tries to resolve the words into entities — real-world "things" in a knowledge graph, each with attributes and relationships — and decide which known entity your brand actually is. If it cannot resolve you to a thing, your sentences are tokens it can match but never truly know.

That gate sits underneath every citation decision, and most marketing teams never see it. They optimize copy for a query while the engine quietly asks a more basic question: *is this brand a thing I recognize, and if so, which one?* This piece traces how that resolution works, using the primary sources that describe it — entity-resolution patents from a spread of companies, the peer-reviewed entity-linking literature, and Google's own knowledge-graph disclosures — and closes with a single model, the Entity Authority Stack, plus an honest account of where the evidence stops.

<Aside kind="fact" title="The short version">
An answer engine resolves your name to an entity, disambiguates which entity you are, reads what attributes the graph holds about you, weighs what you're associated with, and judges whether you're the salient source on a topic. Citation is decided across those five layers — and a brand the model can't resolve is mentioned at best, never cited.
</Aside>

## Does AI search even know your brand is a thing?

The foundational move happened more than a decade before ChatGPT, and generative engines inherited its wiring.

<Claim id="claim-1">In May 2012, Google introduced the [Knowledge Graph](https://blog.google/products-and-platforms/products/search/introducing-knowledge-graph-things-not/) as an intelligent model that "understands real-world entities and their relationships to one another: things, not strings," stating it then contained "more than 500 million objects, as well as more than 3.5 billion facts."</Claim> That sentence is the hinge. Search stopped being only about matching the characters in a query to the characters on a page and started maintaining a model of the *things* those characters refer to.

<EntityStringToThing />

The plumbing that makes a brand a "thing" in an index is described, in different words, across a spread of patents — and the spread itself matters. <Claim id="claim-2">Microsoft's ["Knowledge-based entity detection and disambiguation" patent (US9665643B2)](https://patents.google.com/patent/US9665643B2/en) describes associating entity identifiers with a web page and storing them "as metadata of the page in a search engine index," which "will enable entity-based queries" and the ability to re-rank results by entity rather than keyword.</Claim> Read these patents as the *vocabulary and mechanics* of entity resolution as a field — Microsoft, Google, Amazon, and independents all describe variants of the same machinery. None is a sworn confession of what any specific 2026 engine ships in production; what they establish is that the levers are general and well-understood.

## What happens when the model has never heard of you?

"Existence" has two faces, and conflating them is the most common mistake I see. One is whether you're a node in a graph the engine can look up. The other is whether the language model has *parametrically* learned you — whether your facts are baked into its weights. The peer-reviewed probes on the second face are humbling.

<Claim id="claim-6">The [LAMA probe](https://aclanthology.org/D19-1250/) showed that pretrained models store relational knowledge recoverable through fill-in-the-blank cloze statements with no fine-tuning</Claim> — so models *do* carry entity facts. But the coverage is brutally uneven by popularity. <Claim id="claim-4">The [PopQA study](https://aclanthology.org/2023.acl-long.546/) found that "scaling... fails to appreciably improve memorization of factual knowledge in the long tail," while retrieval augmentation helped significantly for low-popularity entities.</Claim> <Claim id="claim-5">The [Head-to-Tail benchmark](https://arxiv.org/abs/2308.10168), spanning 18,000 QA pairs across 16 models, concluded that "existing LLMs are still far from being perfect... especially for facts of torso-to-tail entities."</Claim>

<EntityTailKnowledge />

The bridge between the two faces is the most actionable finding in this entire literature. <Claim id="claim-3">The [WildHallucinations benchmark](https://arxiv.org/abs/2407.17468) — built from real entities mined from chatbot conversations — found that "about half" of its 7,919 entities "do not have associated Wikipedia pages," and that "LLMs consistently hallucinate more on entities without Wikipedia pages," across 118,785 generations from 15 models.</Claim> No structured presence is not a neutral starting point; it is the single highest-risk condition for being fabricated or ignored.

<Pullquote>The model doesn't punish you for being small. It punishes you for being illegible — for having no structured, corroborated record it can resolve instead of guess.</Pullquote>

Most B2B brands live in that torso-to-tail. The lever is not "rank higher"; it is "become retrievable" — a graph node with enough corroboration that the engine looks you up rather than hallucinating you.

## Which "Acme" are you?

Once a name *can* be resolved, the engine faces the next problem: names collide. There is an Acme that sells CRM software, an Acme that presses records, and an Acme that supplies anvils to cartoon coyotes. Disambiguation decides which one a given mention means.

<EntityNamesakeCollision />

The mechanism is context. <Claim id="claim-7">Google's ["Additive context model for entity resolution" patent (US9697475B1)](https://patents.google.com/patent/US9697475B1/en) describes resolving an ambiguous mention by building a vector of context features from the surrounding document and selecting the highest-scoring candidate entity.</Claim> The modern research version is dense and learned rather than hand-built. <Claim id="claim-8">The [BLINK system](https://arxiv.org/abs/1911.03814) performs zero-shot entity linking by defining each entity through a short textual description, retrieving candidates with a bi-encoder and re-ranking them with a cross-encoder.</Claim> <Claim id="claim-12">[ReFinED](https://arxiv.org/abs/2207.04108) folds mention detection, fine-grained typing, and disambiguation into a single pass and generalises to Wikidata, "which has around 15 times more entities than Wikipedia."</Claim>

There's a quiet asymmetry here worth naming. <Claim id="claim-9">Microsoft's [knowledge-graph query-expansion application (US20150095319A1)](https://patents.google.com/patent/US20150095319A1/en) describes alias, disambiguation, filter, and ranking expansion segments, and notes that "when an identified entity is famous, renown, a celebrity, or simply unique a disambiguation segment may not be necessary."</Claim> The famous don't need to fight for disambiguation; the obscure do. Your lever is to make the context unmistakable — consistent category language, identifiers, and `sameAs` links — so the right Acme always wins the score.

## What does the graph believe you are?

Resolving you to *an* entity isn't the same as knowing *what* you are. The attribute layer is the record the graph holds — your type, your founding, your category, your relationships — and, unusually, it's a record you can largely write.

<EntityAttributeRecord />

Google is explicit that this is a brand-controlled lever. <Claim id="claim-10">Its [Organization structured-data documentation](https://developers.google.com/search/docs/appearance/structured-data/organization) states that "some properties are used behind the scenes to disambiguate your organization from other organizations (like iso6523 and naics), while others can influence visual elements in Search results (such as which logo is shown in Search results and your knowledge panel)."</Claim>

> Google's search results sometimes show information that comes from our Knowledge Graph, our database of billions of facts about people, places, and things.

<Claim id="claim-11">Google also documents that content owners can [suggest changes to knowledge panels they have claimed](https://support.google.com/knowledgepanel/answer/9787176?hl=en).</Claim> Taken together: the attributes you publish in structured form are the draft the graph edits. Leave them blank and the machine fills the record from whatever it can scrape — which is exactly how a wrong founding date or a stale description ends up in an AI answer with your name on it.

## Who are you connected to?

Entities don't sit alone in the graph; they sit in a neighbourhood. What a model believes your brand *relates to* is learned from how often you co-occur with your category, its authorities, and its other entities.

<EntityAssociationConstellation />

The mechanism is geometric. <Claim id="claim-13">The [TransE method](https://proceedings.neurips.cc/paper/2013/hash/1cecc7a77928ca8133fa24680a88d2f9-Abstract.html) models relationships as translations over low-dimensional entity embeddings and "significantly outperforms state-of-the-art methods in link prediction on two knowledge bases."</Claim> The retrieval-era version assembles the neighbourhood on the fly. <Claim id="claim-14">Microsoft Research's [GraphRAG](https://arxiv.org/abs/2404.16130) uses an LLM "to derive an entity knowledge graph from the source documents, then to pregenerate community summaries for all groups of closely related entities" — so closely-related entities become the unit of retrieval.</Claim>

What about evidence from the generative era itself? Here the strongest numbers come from vendor studies, and they should be read as direction, not proof. <Claim id="claim-15">Ahrefs' [study of 75,000 brands](https://ahrefs.com/blog/ai-brand-visibility-correlations/), using Spearman correlation, reported that "YouTube mentions show the strongest correlation with AI visibility (~0.737), outperforming every other factor across ChatGPT, AI Mode, and AI Overviews," ahead of branded web mentions and anchors.</Claim> Correlation is not causation, and a tool vendor has incentives — but the *shape* (off-site association beats on-page tuning) is consistent with how the graph is built. It also rhymes with the content-side finding that <Claim id="claim-16">[generative-engine optimization](https://arxiv.org/abs/2311.09735) can lift a source's visibility "by up to 40%" through quotations, statistics, and citations rather than keywords</Claim> — a result we treat in depth in a companion piece, since it governs which retrieved candidate gets cited rather than how you become an entity.

## Are you the entity it cites — or one it merely mentions?

The final layer is the one most brands never think about: among the entities the engine could name, which is *central* enough to actually cite? That's salience, and search systems have modeled it for years.

<EntitySalienceMeter />

<Claim id="claim-17">Microsoft's ["Identifying salient items in documents" patent (US9251473B2)](https://patents.google.com/patent/US9251473B2/en) describes scoring an item's salience to a page using a soft function — roughly, the ratio of visits whose queries include the item to the total visits to that page.</Claim> Salience is the machine's formal answer to "is this page *about* the entity, or does it just mention it?" And prominence — being the entity worth surfacing — is scored across sources, not asserted on your own page. <Claim id="claim-18">Google's [location-prominence patent (US8046371B2)](https://patents.google.com/patent/US8046371B2/en) computes a score from factors including "the total number of documents referring to a business" and "the number of information documents that mention the business."</Claim> <Claim id="claim-20">And its [entity-metrics ranking patent (US10235423B2)](https://patents.google.com/patent/US10235423B2/en) describes ranking by combining knowledge-graph metrics — relatedness, notable entity type, contribution, and prize — with weights that depend on the type of entity.</Claim>

Where does the engine go to satisfy that salience? Increasingly, to you. <Claim id="claim-19">Yext's study of [6.8 million AI citations](https://investors.yext.com/news-events/press-releases/detail/376/yext-research-86-of-ai-citations-come-from-brand-managed) across ChatGPT, Gemini, and Perplexity reported that 86% of citations came from sources brands already control, with first-party websites generating 44% of citations and listings 42%.</Claim> Own the definitive, structured page on your topic and the math compounds: highest salience, on a surface the engine already trusts to cite.

## The Entity Authority Stack

Put the sources together and one model falls out. Five layers sit between a brand and a cited entity, and each has exactly one lever an operator controls.

<EntityAuthorityStack />

To make it concrete, take a single, clearly-fictional brand and run it down the stack — what the engine asks at each layer, and the one move that wins it:

<EntityWorkedExample />

The pattern in that example is the pattern almost everywhere. Brands invest in the visible middle — attributes, copy, keywords — and skip the floor and the ceiling. They're legible but not authoritative: the model can resolve them, but never finds a reason to make them the cited entity. The advantage is bottom-heavy and top-heavy, exactly where no one is working.

## What this means for the work

Three shifts follow directly from the evidence, in order of effort-to-impact.

**Become resolvable before you become persuasive.** A Wikidata item, consistent identifiers, and `sameAs` links across your real profiles turn an ambiguous string into a node the engine can look up. The probes are unanimous that retrieval against a structured record beats parametric guessing for everyone outside the head — and most brands are outside the head.

**Write the record, don't leave it blank.** Organization and Product structured data is the draft the graph edits. Publish your type, founding, category, and relationships in machine-readable form, or accept whatever the machine infers. This is the cheapest high-return work in the stack:

```json
{
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "Tula Analytics",
  "sameAs": ["https://www.wikidata.org/wiki/Q000", "https://www.linkedin.com/company/…"],
  "foundingDate": "2019",
  "knowsAbout": ["product analytics", "retention measurement"]
}
```

**Earn the neighbourhood and own the topic.** Associations and salience are the layers that convert "known" into "cited." Be mentioned where your category is discussed — including video — and own one definitive page that the category treats as canonical. The measured signals are correlational, but they point the same way the mechanism does.

## Where the evidence runs out

A serious reading marks its own edges. The patents here are *disclosed methods* in permissive drafting language, assigned to a mix of companies — Microsoft, Google, Amazon, NEC, and independents — and none is a sworn description of how a named 2026 engine resolves entities; they establish the field's mechanics, not any one vendor's production system. Several of the strongest mechanism sources — the location-prominence and salience patents, the KESM salience model — predate generative engines and describe classic information retrieval, applied here by analogy. And the generative-era evidence for the association and salience layers leans on two vendor studies (Ahrefs and Yext): large and original, but correlational and commercially motivated, not peer-reviewed causal proof.

What survives all of that is the shape. Engines resolve names to entities, disambiguate which entity you are, read the attributes the graph holds, weigh your associations, and judge your salience — and a brand that fails the early layers is never eligible for the late ones. Become a resolvable, well-described, well-connected, salient thing, and you compound across every engine that reads the graph next. Optimize a string, and you're betting the machine guesses right about a brand it was never taught.

<div id="run-free" className="scroll-mt-24">
  <InlineToolRunner defaultTab="citerra" />
</div>

— Sundar Ramesh Kumar · martech.llc
