How does AI search decide whether to cite a brand?

Before any citation decision, the engine resolves the words on a page to entities — real-world 'things' in a knowledge graph with attributes and relationships. If your brand is not a resolvable entity, the model treats your name as an ambiguous string it can match but cannot reliably know. Citation comes later; entity resolution is the gate before it.

What is the Entity Authority Stack?

It is a five-layer model of how AI search turns a brand into a citable entity: Existence (are you a resolvable entity at all), Disambiguation (which entity you are when names collide), Attributes (what the graph believes you are), Associations (what and who you co-occur with), and Salience (whether you are central enough to be the entity cited). Each layer has one operator lever.

Why do AI models hallucinate about smaller brands?

Peer-reviewed probes find that language models reliably recall facts about popular, well-documented entities and struggle on long-tail ones. In the WildHallucinations benchmark, about half of 7,919 real-world entities had no Wikipedia page, and models hallucinated more on entities without one. A thin, unstructured web presence is the highest-risk case; retrieval against a structured record is the rescue.

What is the strongest signal for brand visibility in AI search?

A 75,000-brand study by Ahrefs reported that off-site brand mentions — most strongly YouTube mentions, at a Spearman correlation around 0.737 — correlated more with AI visibility than on-page factors across ChatGPT, AI Mode, and AI Overviews. It is a correlation from a vendor study, not proven causation, so treat it as direction rather than a guarantee.

Is ranking first on Google enough to be cited by AI?

No. Ranking is about matching a query to a page; entity authority is about whether the machine can resolve your brand to a known thing and judge it the salient source on a topic. A page can rank well and still describe an entity the model cannot resolve, disambiguate, or treat as central — in which case it is mentioned, not cited.

All research

Entity research11 min read2026-06-06

How AI search resolves your brand to an entity — and the five layers that decide

Before an AI engine cites your brand, it resolves it to an entity — a 'thing' in a knowledge graph, not your keywords. Drawing on entity-resolution patents, entity-linking research, and Google's Knowledge Graph, this maps the five layers of entity authority.

2026-06-0611 min read

AI search rarely reads your page as a bag of keywords. Before it cites anything, it tries to resolve the words into entities — real-world "things" in a knowledge graph, each with attributes and relationships — and decide which known entity your brand actually is. If it cannot resolve you to a thing, your sentences are tokens it can match but never truly know. Entity resolution only runs once the engine has decided to retrieve at all — the upstream Grounding Gate.

That gate sits underneath every citation decision, and most marketing teams never see it. They optimize copy for a query while the engine quietly asks a more basic question: is this brand a thing I recognize, and if so, which one? This piece traces how that resolution works, using the primary sources that describe it — entity-resolution patents from a spread of companies, the peer-reviewed entity-linking literature, and Google's own knowledge-graph disclosures — and closes with a single model, the Entity Authority Stack, plus an honest account of where the evidence stops.

Does AI search even know your brand is a thing?

The foundational move happened more than a decade before ChatGPT, and generative engines inherited its wiring.

In May 2012, Google introduced the Knowledge Graph as an intelligent model that "understands real-world entities and their relationships to one another: things, not strings," stating it then contained "more than 500 million objects, as well as more than 3.5 billion facts."[1] That sentence is the hinge. Search stopped being only about matching the characters in a query to the characters on a page and started maintaining a model of the things those characters refer to.

FIG 01Things, not strings · the founding moveAI search doesn't match your words — it resolves you to an entity

Before · string match

query: “acme crm pricing”

…the acme platform’s crm plans and pricing tiers…

Characters overlap, so the page ranks. The engine has no idea what Acme is.

After · entity node/m/0a1b2c

Acme CRM

type: Organization · SaaS
founded: 2019
sameAs: wikidata · linkedin · crunchbase
industry: sales software

One of 500M+ objects, joined to 3.5B+ facts. Now the engine knows what you are.

Search’s unit of meaning moved from the string to the thing more than a decade ago, and generative engines inherited that wiring. If a model cannot resolve your brand to a stable entity, your sentences are just tokens — eligible to be matched, never to be known.

Mechanism: Google Knowledge Graph — “things, not strings” (May 16, 2012) · 500M+ objects, 3.5B+ facts · node record illustrative

The plumbing that makes a brand a "thing" in an index is described, in different words, across a spread of patents — and the spread itself matters. Microsoft's "Knowledge-based entity detection and disambiguation" patent (US9665643B2) describes associating entity identifiers with a web page and storing them "as metadata of the page in a search engine index," which "will enable entity-based queries" and the ability to re-rank results by entity rather than keyword.[2] Read these patents as the vocabulary and mechanics of entity resolution as a field — Microsoft, Google, Amazon, and independents all describe variants of the same machinery. None is a sworn confession of what any specific 2026 engine ships in production; what they establish is that the levers are general and well-understood.

What happens when the model has never heard of you?

"Existence" has two faces, and conflating them is the most common mistake I see. One is whether you're a node in a graph the engine can look up. The other is whether the language model has parametrically learned you — whether your facts are baked into its weights. The peer-reviewed probes on the second face are humbling.

The LAMA probe showed that pretrained models store relational knowledge recoverable through fill-in-the-blank cloze statements with no fine-tuning[6] — so models do carry entity facts. But the coverage is brutally uneven by popularity. The PopQA study found that "scaling... fails to appreciably improve memorization of factual knowledge in the long tail," while retrieval augmentation helped significantly for low-popularity entities.[4] The Head-to-Tail benchmark, spanning 18,000 QA pairs across 16 models, concluded that "existing LLMs are still far from being perfect... especially for facts of torso-to-tail entities."[5]

FIG 02The binding constraint · the tail problemA model knows head entities cold — and hallucinates the tail where most brands liveBars contrast a model alone vs. the same model with retrieval — height is illustrative of the documented effect

~50%

of 7,919 real-world entities have no Wikipedia page

118,785

generations across 15 LLMs (WildHallucinations)

14k

long-tail questions in the PopQA probe

18K · 16

QA pairs / LLMs in Head-to-Tail

Head entitiesfamous, dense Wikipedia/Wikidata presence

model alone

+ retrieval

Torso entitiesknown, thin structured coverage

model alone

+ retrieval

Tail entitiesmost B2B brands live here — little/no graph node

model alone

+ retrieval

Scaling the model barely moves the tail — the peer-reviewed probes are blunt about it. What rescues a little-known brand is being retrievable: a structured, corroborated entity the engine can look up instead of guess. No graph presence is the highest-hallucination case there is.

Sources: WildHallucinations (arXiv 2407.17468) · PopQA (ACL 2023) · Head-to-Tail (NAACL 2024) · figures cited inline · band heights illustrative

The bridge between the two faces is the most actionable finding in this entire literature. The WildHallucinations benchmark — built from real entities mined from chatbot conversations — found that "about half" of its 7,919 entities "do not have associated Wikipedia pages," and that "LLMs consistently hallucinate more on entities without Wikipedia pages," across 118,785 generations from 15 models.[3] No structured presence is not a neutral starting point; it is the single highest-risk condition for being fabricated or ignored.

The model doesn't punish you for being small. It punishes you for being illegible — for having no structured, corroborated record it can resolve instead of guess.

Most B2B brands live in that torso-to-tail. The lever is not "rank higher"; it is "become retrievable" — a graph node with enough corroboration that the engine looks you up rather than hallucinating you.

Which "Acme" are you?

Once a name can be resolved, the engine faces the next problem: names collide. There is an Acme that sells CRM software, an Acme that presses records, and an Acme that supplies anvils to cartoon coyotes. Disambiguation decides which one a given mention means.

FIG 03Disambiguation · which one are youOne name, three entities — the surrounding context decides which is youContext scores are illustrative of the documented additive-context mechanism — not measured

Surface mention

“Acme”

context featurespricingintegrationsseatsB2Bpipeline

Acme CRM· resolved
B2B software company
0.88
Acme Records
music label
0.26
Acme
fictional supplier (cartoon)
0.11

The engine builds a context vector from the words around your name and the other entities in the document, then picks the candidate that context supports best. Your lever is to make the context unmistakable — consistent category language, identifiers, and sameAslinks — so the right “Acme” always wins.

Mechanism: additive context model for entity resolution (US9697475B1) + dense entity linking (BLINK) · scores illustrative

The mechanism is context. Google's "Additive context model for entity resolution" patent (US9697475B1) describes resolving an ambiguous mention by building a vector of context features from the surrounding document and selecting the highest-scoring candidate entity.[7] The modern research version is dense and learned rather than hand-built. The BLINK system performs zero-shot entity linking by defining each entity through a short textual description, retrieving candidates with a bi-encoder and re-ranking them with a cross-encoder.[8] ReFinED folds mention detection, fine-grained typing, and disambiguation into a single pass and generalises to Wikidata, "which has around 15 times more entities than Wikipedia."[12]

There's a quiet asymmetry here worth naming. Microsoft's knowledge-graph query-expansion application (US20150095319A1) describes alias, disambiguation, filter, and ranking expansion segments, and notes that "when an identified entity is famous, renown, a celebrity, or simply unique a disambiguation segment may not be necessary."[9] The famous don't need to fight for disambiguation; the obscure do. Your lever is to make the context unmistakable — consistent category language, identifiers, and sameAs links — so the right Acme always wins the score.

What does the graph believe you are?

Resolving you to an entity isn't the same as knowing what you are. The attribute layer is the record the graph holds — your type, your founding, your category, your relationships — and, unusually, it's a record you can largely write.

FIG 04Attributes · what the graph believesWhat the machine thinks you are is a record you can largely write

Behind the scenes · disambiguation

sameAs
links your profiles into one node
iso6523 · naics
industry identifiers
duns · leiCode
registry IDs
foundingDate
fixes you in time

On the surface · what the answer shows

name · legalName
the label the answer prints
logo
the mark in the knowledge panel
description
the one-line the engine paraphrases
contactPoint · address
the facts it cites

1 · you supply

Organization + Product schema on your own pages

2 · graph stores

a canonical node: type, attributes, relationships

3 · answer uses

the facts it can attribute — paraphrased or cited

Google states it plainly: some structured-data properties exist purely to disambiguate your organization from others, while others decide which logo and facts show in Search and the knowledge panel. The attributes you publish are the draft the graph edits — leave them blank and the machine fills them from whatever it can find.

Mechanism: Google Organization structured-data docs + Knowledge Panel Help (verified-owner edits) · property set per schema.org

Google is explicit that this is a brand-controlled lever. Its Organization structured-data documentation states that "some properties are used behind the scenes to disambiguate your organization from other organizations (like iso6523 and naics), while others can influence visual elements in Search results (such as which logo is shown in Search results and your knowledge panel)."[10]

Google's search results sometimes show information that comes from our Knowledge Graph, our database of billions of facts about people, places, and things.

Google also documents that content owners can suggest changes to knowledge panels they have claimed.[11] Taken together: the attributes you publish in structured form are the draft the graph edits. Leave them blank and the machine fills the record from whatever it can scrape — which is exactly how a wrong founding date or a stale description ends up in an AI answer with your name on it.

Who are you connected to?

Entities don't sit alone in the graph; they sit in a neighbourhood. What a model believes your brand relates to is learned from how often you co-occur with your category, its authorities, and its other entities.

FIG 05Associations · you are what you're cited next toThe strongest measured AI-visibility signals are off-site mentions, not on-page SEOSpearman correlation, not causation · vendor study

your entity

co-occurs with

your category

YouTubestrongest correlate ↓

Wikipedia / Wikidata

industry authorities

named competitors

the press

Correlation with AI visibility · across ChatGPT, AI Mode, AI Overviews

YouTube mentions0.737

Branded web mentions0.66–0.71

Branded anchors0.51–0.63

Branded search volume0.35–0.47

A model’s sense of what your brand relates to is learned from how often you co-occur with your category and its authorities. The largest single correlation a 75,000-brand study found was being mentioned — most strongly on YouTube — not anything on your own page. Read it as direction, not proof: correlation is not causation, and the study is vendor-run.

Mechanism: knowledge-graph embeddings (TransE, GraphRAG) · correlations: Ahrefs Brand Radar, 75,000 brands (Dec 12, 2025)

The mechanism is geometric. The TransE method models relationships as translations over low-dimensional entity embeddings and "significantly outperforms state-of-the-art methods in link prediction on two knowledge bases."[13] The retrieval-era version assembles the neighbourhood on the fly. Microsoft Research's GraphRAG uses an LLM "to derive an entity knowledge graph from the source documents, then to pregenerate community summaries for all groups of closely related entities" — so closely-related entities become the unit of retrieval.[14]

What about evidence from the generative era itself? Here the strongest numbers come from vendor studies, and they should be read as direction, not proof. Ahrefs' study of 75,000 brands, using Spearman correlation, reported that "YouTube mentions show the strongest correlation with AI visibility (~0.737), outperforming every other factor across ChatGPT, AI Mode, and AI Overviews," ahead of branded web mentions and anchors.[15] Correlation is not causation, and a tool vendor has incentives — but the shape (off-site association beats on-page tuning) is consistent with how the graph is built. It also rhymes with the content-side finding that generative-engine optimization can lift a source's visibility "by up to 40%" through quotations, statistics, and citations rather than keywords[16] — a result we treat in depth in a companion piece, since it governs which retrieved candidate gets cited rather than how you become an entity.

Are you the entity it cites — or one it merely mentions?

The final layer is the one most brands never think about: among the entities the engine could name, which is central enough to actually cite? That's salience, and search systems have modeled it for years.

FIG 06Salience · the entity it cites, not the one it mentionsBe the page the answer is about — not the one that name-drops you onceSalience dials illustrative of the documented entity-salience model · Yext figure measured

A page ABOUT you

Your entity is the subject — every section, fact, and heading is about it.

salience0.79

A page that MENTIONS you

Your name appears once in a list — present, but not what the page is about.

salience0.08

Prominence = corroboration

Prominence is scored from how many other authoritative sources reference you — counted across the corpus, not asserted on your own page.

86%

of AI citations come from brand-managed sources

Salience is the engine’s answer to “is this page aboutthe entity, or does it just mention it?” — and the most-cited surfaces, a 6.8-million-citation study found, are the ones the brand controls. Own a definitive page on your topic and the math works in your favour twice: highest salience, on a source the engine already trusts to cite.

Mechanism: entity-salience model (US9251473B2) + location-prominence corroboration (US8046371B2) · citation share: Yext (6.8M citations, measured)

Microsoft's "Identifying salient items in documents" patent (US9251473B2) describes scoring an item's salience to a page using a soft function — roughly, the ratio of visits whose queries include the item to the total visits to that page.[17] Salience is the machine's formal answer to "is this page about the entity, or does it just mention it?" And prominence — being the entity worth surfacing — is scored across sources, not asserted on your own page. Google's location-prominence patent (US8046371B2) computes a score from factors including "the total number of documents referring to a business" and "the number of information documents that mention the business."[18] And its entity-metrics ranking patent (US10235423B2) describes ranking by combining knowledge-graph metrics — relatedness, notable entity type, contribution, and prize — with weights that depend on the type of entity.[20]

Where does the engine go to satisfy that salience? Increasingly, to you. Yext's study of 6.8 million AI citations across ChatGPT, Gemini, and Perplexity reported that 86% of citations came from sources brands already control, with first-party websites generating 44% of citations and listings 42%.[19] Own the definitive, structured page on your topic and the math compounds: highest salience, on a surface the engine already trusts to cite.

The Entity Authority Stack

Put the sources together and one model falls out. Five layers sit between a brand and a cited entity, and each has exactly one lever an operator controls.

FIG 07The Entity Authority StackFive layers sit between a brand and a cited entity — each has one operator lever

ExistenceLever · Earn a structured, corroborated entity record

Are you a resolvable entity at all — or just a string the model has never met?

grounded in · Google KG · BLINK/GENRE · WildHallucinations

DisambiguationLever · Feed disambiguating context + identifiers

When your name collides with others, which entity is you?

grounded in · Entity-resolution patents · ReFinED

AttributesLever · Publish consistent structured attributes

What does the graph believe you ARE?

grounded in · Knowledge Graph · Organization schema

AssociationsLever · Earn co-occurrence with your category + authorities

What — and who — are you connected to?

grounded in · TransE · GraphRAG · Ahrefs

SalienceLever · Be the primary source, corroborated widely

Are you central enough to be the entity it cites?

grounded in · Salience patents · KESM · Yext

Most brands pour their effort into L3 — keywords and copy — and treat the rest as luck. The entity that gets cited is built bottom-up: it exists, resolves cleanly, carries the right attributes, sits in the right neighbourhood, and is the salient source on its topic. Work the stack from the floor, not the façade.

Framework: martech.llc · synthesis of entity-resolution patents + entity-linking research + first-party knowledge-graph docs

To make it concrete, take a single, clearly-fictional brand and run it down the stack — what the engine asks at each layer, and the one move that wins it:

Worked example · run the stack on one brand“Tula Analytics” — a resolvable brand that still isn't the cited one

The brand

“Tula Analytics” — a real-ish, mid-market product-analytics SaaS, new to most models.

Walked down the 5-layer Entity Authority Stack

L1 · existence· you cover
Is “Tula Analytics” a thing the model can resolve?
tests · whether you are an entity or an unknown string
What earns it on this layer
A Wikidata item + consistent name/identifiers across every profile.
L2 · disambiguation· you cover
“Tula” is also a city and a skincare brand — which is you?
tests · separation from same-name entities
What earns it on this layer
Organization schema with industry + sameAs makes the SaaS unmistakable.
L3 · attributes· you cover
What is Tula — category, founding, product?
tests · what the graph believes you are
What earns it on this layer
Organization + Product schema writes the record the answer paraphrases.
L4 · associations· you absent
Is Tula tied to “product analytics”, its rivals, its authorities?
tests · the neighbourhood you co-occur in
What earns it on this layer
Mentions in category roundups + a real YouTube presence — the weak spot.
L5 · salience· you absent
Is Tula the source an answer cites on its topic?
tests · central source vs. passing mention
What earns it on this layer
A definitive, widely-referenced page the category treats as canonical.

3/5steps covered

Tula does the visible work — it exists, disambiguates, and is well-described (L1–L3) — so a model can resolve it. But it’s thin on associations and salience (L4–L5), so it’s rarely the entity an answer actually cites. That is the typical B2B profile: legible, not authoritative. The advantage is bottom-heavy — the layers almost everyone skips are the ones that convert “known” into “cited.”

Tula Analytics is a fictional brand · steps are illustrative of the documented mechanisms, not measured data

The pattern in that example is the pattern almost everywhere. Brands invest in the visible middle — attributes, copy, keywords — and skip the floor and the ceiling. They're legible but not authoritative: the model can resolve them, but never finds a reason to make them the cited entity. The advantage is bottom-heavy and top-heavy, exactly where no one is working.

What this means for the work

Three shifts follow directly from the evidence, in order of effort-to-impact.

Become resolvable before you become persuasive. A Wikidata item, consistent identifiers, and sameAs links across your real profiles turn an ambiguous string into a node the engine can look up. The probes are unanimous that retrieval against a structured record beats parametric guessing for everyone outside the head — and most brands are outside the head.

Write the record, don't leave it blank. Organization and Product structured data is the draft the graph edits. Publish your type, founding, category, and relationships in machine-readable form, or accept whatever the machine infers. This is the cheapest high-return work in the stack:

JSON

{
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "Tula Analytics",
  "sameAs": ["https://www.wikidata.org/wiki/Q000", "https://www.linkedin.com/company/…"],
  "foundingDate": "2019",
  "knowsAbout": ["product analytics", "retention measurement"]
}

Earn the neighbourhood and own the topic. Associations and salience are the layers that convert "known" into "cited." Be mentioned where your category is discussed — including video — and own one definitive page that the category treats as canonical. The measured signals are correlational, but they point the same way the mechanism does.

Where the evidence runs out

A serious reading marks its own edges. The patents here are disclosed methods in permissive drafting language, assigned to a mix of companies — Microsoft, Google, Amazon, NEC, and independents — and none is a sworn description of how a named 2026 engine resolves entities; they establish the field's mechanics, not any one vendor's production system. Several of the strongest mechanism sources — the location-prominence and salience patents, the KESM salience model — predate generative engines and describe classic information retrieval, applied here by analogy. And the generative-era evidence for the association and salience layers leans on two vendor studies (Ahrefs and Yext): large and original, but correlational and commercially motivated, not peer-reviewed causal proof.

What survives all of that is the shape. Engines resolve names to entities, disambiguate which entity you are, read the attributes the graph holds, weigh your associations, and judge your salience — and a brand that fails the early layers is never eligible for the late ones. Become a resolvable, well-described, well-connected, salient thing, and you compound across every engine that reads the graph next. Optimize a string, and you're betting the machine guesses right about a brand it was never taught.

Run it on your own page · free

Don’t take our word for it — measure it.

— Sundar Ramesh Kumar · martech.llc

Frequently asked questions

How does AI search decide whether to cite a brand?: Before any citation decision, the engine resolves the words on a page to entities — real-world 'things' in a knowledge graph with attributes and relationships. If your brand is not a resolvable entity, the model treats your name as an ambiguous string it can match but cannot reliably know. Citation comes later; entity resolution is the gate before it.
What is the Entity Authority Stack?: It is a five-layer model of how AI search turns a brand into a citable entity: Existence (are you a resolvable entity at all), Disambiguation (which entity you are when names collide), Attributes (what the graph believes you are), Associations (what and who you co-occur with), and Salience (whether you are central enough to be the entity cited). Each layer has one operator lever.
Why do AI models hallucinate about smaller brands?: Peer-reviewed probes find that language models reliably recall facts about popular, well-documented entities and struggle on long-tail ones. In the WildHallucinations benchmark, about half of 7,919 real-world entities had no Wikipedia page, and models hallucinated more on entities without one. A thin, unstructured web presence is the highest-risk case; retrieval against a structured record is the rescue.
Does schema.org structured data help with AI search?: It is the main brand-controlled lever at the Attributes layer. Google's documentation states that some Organization structured-data properties are used to disambiguate your organization from others, while others influence which logo and details appear in Search and the knowledge panel. Structured attributes are the record the graph stores and the answer paraphrases.
What is the strongest signal for brand visibility in AI search?: A 75,000-brand study by Ahrefs reported that off-site brand mentions — most strongly YouTube mentions, at a Spearman correlation around 0.737 — correlated more with AI visibility than on-page factors across ChatGPT, AI Mode, and AI Overviews. It is a correlation from a vendor study, not proven causation, so treat it as direction rather than a guarantee.
Do most AI citations come from a brand's own properties?: Largely, yes. A Yext study of 6.8 million AI citations across ChatGPT, Gemini, and Perplexity reported that 86% of citations came from sources brands already control, with first-party websites generating 44% and listings 42%. Owning a definitive, structured page on your topic puts you on the surfaces engines cite most.
Is ranking first on Google enough to be cited by AI?: No. Ranking is about matching a query to a page; entity authority is about whether the machine can resolve your brand to a known thing and judge it the salient source on a topic. A page can rank well and still describe an entity the model cannot resolve, disambiguate, or treat as central — in which case it is mentioned, not cited.

Filed underresearch note#entity-seo#knowledge-graph#generative-engine-optimization#ai-search#brand-entity

Sources · 20

Every claim, dated and linked

[1]
Google's Knowledge Graph launched in May 2012 as an intelligent model that understands real-world entities and their relationships — 'things, not strings' — and was stated to contain more than 500 million objects and more than 3.5 billion facts.
Google — Introducing the Knowledge Graph: things, not strings2012-05-16
[2]
Microsoft's 'Knowledge-based entity detection and disambiguation' patent (US9665643B2) describes associating entity identifiers with a web page and storing them as metadata in a search engine index, enabling entity-based queries and re-ranking of results by entity rather than keyword.
Microsoft — Knowledge-based entity detection and disambiguation (US9665643B2)2017-05-30
[3]
The WildHallucinations benchmark found that about half of 7,919 real-world entities had no associated Wikipedia page, and that LLMs consistently hallucinate more on entities without Wikipedia pages, across 118,785 generations from 15 LLMs.
Zhao et al. — WildHallucinations: Evaluating Long-form Factuality with Real-World Entity Queries2024-07-24
[4]
The PopQA study found that language models struggle with less popular factual knowledge and that scaling fails to appreciably improve memorization of factual knowledge in the long tail, while retrieval augmentation helps significantly for low-popularity entities.
Mallen et al. — When Not to Trust Language Models (PopQA), ACL 20232023-07-01
[5]
The Head-to-Tail benchmark of 18,000 QA pairs evaluated across 16 LLMs found that existing LLMs remain far from perfect on factual knowledge, especially for facts about torso-to-tail entities.
Sun et al. — Head-to-Tail: How Knowledgeable are LLMs?, NAACL 20242023-08-20
[6]
The LAMA probe demonstrated that pretrained language models store relational knowledge that can be recovered with fill-in-the-blank cloze statements without fine-tuning.
Petroni et al. — Language Models as Knowledge Bases?, EMNLP-IJCNLP 20192019-11-03
[7]
Google's 'Additive context model for entity resolution' patent (US9697475B1) describes resolving an ambiguous mention to a knowledge-base entity by building a vector of context features from the surrounding document and selecting the highest-scoring candidate entity.
Google — Additive context model for entity resolution (US9697475B1)2017-07-04
[8]
The BLINK system performs zero-shot entity linking by defining each entity only through a short textual description, using a bi-encoder to retrieve candidates in a dense space and a cross-encoder to re-rank them.
Wu et al. — Scalable Zero-shot Entity Linking with Dense Entity Retrieval (BLINK)2019-11-10
[9]
Microsoft's knowledge-graph query-expansion patent application (US20150095319A1) describes expansion segments including alias, disambiguation, filter, and ranking, and states that a disambiguation segment may be unnecessary when an entity is famous or unique.
Microsoft — Query Expansion, Filtering and Ranking Utilizing Knowledge Graphs (US20150095319A1)2015-04-02
[10]
Google's Organization structured-data documentation states that some properties are used behind the scenes to disambiguate your organization from others, while others can influence visual elements such as the logo shown in Search results and the knowledge panel.
Google Search Central — Organization structured data2025-01-01
[11]
Google states the Knowledge Graph is its database of billions of facts about people, places, and things, and that content owners can suggest changes to knowledge panels they have claimed.
Google — How Google's Knowledge Graph works (Knowledge Panel Help)2024-01-01
[12]
The ReFinED model performs mention detection, fine-grained entity typing, and entity disambiguation for all mentions in a document in a single forward pass and generalises to Wikidata, which has roughly 15 times more entities than Wikipedia.
Ayoola et al. — ReFinED: Efficient Zero-shot-capable End-to-End Entity Linking, NAACL 20222022-07-08
[13]
The TransE method models relationships in a knowledge base as translations operating on low-dimensional entity embeddings and significantly outperformed prior state-of-the-art on link prediction across two knowledge bases.
Bordes et al. — Translating Embeddings for Modeling Multi-relational Data (TransE), NeurIPS 20132013-12-05
[14]
Microsoft Research's GraphRAG method uses an LLM to derive an entity knowledge graph from source documents and then pre-generates community summaries for all groups of closely related entities.
Edge et al. — From Local to Global: A Graph RAG Approach to Query-Focused Summarization2024-04-24
[15]
An Ahrefs study of 75,000 brands using Spearman correlation reported that YouTube mentions showed the strongest correlation with AI visibility, around 0.737, outperforming every other factor across ChatGPT, AI Mode, and AI Overviews.
Ahrefs — Top Brand Visibility Factors in ChatGPT, AI Mode, and AI Overviews (75k Brands)2025-12-12
[16]
A peer-reviewed generative-engine study found that content-level changes such as adding quotations, statistics, and source citations can boost a source's visibility in generative-engine responses by up to roughly 40%.
Aggarwal et al. — GEO: Generative Engine Optimization, KDD 20242023-11-16
[17]
Microsoft's 'Identifying salient items in documents' patent (US9251473B2) describes scoring an item's salience to a web page using a soft function derived from the ratio of page visits whose queries include the item to the total visits to that page.
Microsoft — Identifying salient items in documents (US9251473B2)2016-02-02
[18]
Google's 'Scoring local search results based on location prominence' patent (US8046371B2) describes a prominence score computed from factors including the number of documents referring to a business and the number of information documents that mention the business.
Google — Scoring local search results based on location prominence (US8046371B2)2011-10-25
[19]
A Yext study of 6.8 million AI citations across ChatGPT, Gemini, and Perplexity reported that 86% of citations came from sources brands already control, with first-party websites generating 2.9 million citations (44%) and listings 2.9 million (42%).
Yext — 86% of AI Citations Come from Brand-Managed Sources2025-10-09
[20]
Google's 'Ranking search results based on entity metrics' patent (US10235423B2) describes ranking results by combining knowledge-graph-derived entity metrics — relatedness, notable entity type, contribution, and prize — with weights that depend on the type of entity.
Google — Ranking search results based on entity metrics (US10235423B2)2019-03-19

Up next

Related from the desk

Cited by

Posts that link to this one

Run it on your own page · free

Don’t take our word for it — measure it.

Machine-readable mirror · /research/entity-authority-stack/raw.md