Every brand whose marketing team has spent five minutes thinking about AI search has asked the same question: how do the engines actually decide who gets cited? Why is Perplexity naming us on 86% of relevant prompts while AI Overviews doesn't mention us at all? Why does Claude give us credit while ChatGPT paraphrases us into anonymity?
The honest answer is that nobody outside the engine teams knows the full ranking systems. But there is enough public documentation, observable behavior, and pattern recognition from running citation-share monitoring across thousands of prompts to draw a real map. Each of the five major AI engines has a different posture, a different signal set, and a different definition of what counts as a "good source." Here is the mechanic — engine by engine.
This post is the synthesis of what we've learned instrumenting the major engines for our clients and from publishing the State of AI Shopping Citations 2026 report. It is not a leaked algorithm. It is the operational truth from running the same prompts across all five engines for long enough to see the patterns hold.
The shared substrate
Before getting engine-specific, the signals that move citation rates across all five engines:
- Topical authority. Pages on domains with deep, consistent topical coverage get cited more than pages on broader, shallower domains. The same rule that powered traditional SEO.
- Schema.org structured data. Pages with valid Product, FAQPage, HowTo, and Article schema surface at higher rates than pages without it. Schema is eligibility insurance, not a ranking factor — but the eligibility distinction is large in practice.
- E-E-A-T signals. Named authors, real bios, About-page depth, primary-source citations. The Helpful Content framework is even more central to AI engines than it was to legacy Google.
- Crawler access. If the engine's crawler (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, etc.) can't fetch your page, you are not in the candidate set.
- Off-domain corroboration. Brands named in editorial coverage, comparison content, and authentic UGC surface more often than brands without that footprint. This is the layer most brands underinvest in.
If you do these five things well, you are eligible. Eligibility is necessary, not sufficient — the engine-specific layer on top determines who actually gets named.
Engine 1: ChatGPT
ChatGPT's citation behavior is dominated by paraphrase recall. The model was trained on a massive corpus that included a lot of brand mentions in context; when asked an attribute-shaped question, it surfaces the brands the corpus most strongly associates with that attribute. Browsing-mode (when invoked) adds a layer of real-time retrieval, but the underlying recall mechanism is the dominant force on most queries.
What this means in practice:
- Brands named in the training corpus alongside the target attribute get surfaced. "Brand X is the brand that does Y" — when that association exists across enough sources in the corpus, ChatGPT recalls it.
- Citation in the strict sense is rare. Across our pilot, ChatGPT returned 0 inline citations on browsing-mode queries even when it mentioned brands by name in narrative form. The engine names brands; it doesn't footnote them.
- The optimization lever is cross-domain mention density. Editorial coverage, comparison content, expert roundups, Reddit and Quora content — the corpus the model is trained on. This is a slow, compounding investment.
The full ChatGPT optimization playbook lives on our ChatGPT SEO services page. The compressed version: be the brand that is named in the attribute context, in as many credible places as possible, for long enough that the corpus has absorbed it.
Engine 2: Anthropic Claude (Sonnet 4.6)
Claude's posture is fundamentally different from ChatGPT's. The engine is slower, more deliberate, and explicitly designed (per Anthropic's documentation) to favor long-form synthesis from a smaller number of high-quality sources.
The citation mechanic:
- Web-search when invoked pulls 2–4 sources per answer, with a strong preference for primary-source content (manufacturer pages, original specifications, peer-reviewed research, government data).
- The model favors sources with clear authorship and clear publishing-entity identity. Pages with named authors, real bios, and visible publishing-organization signals are cited at meaningfully higher rates than anonymous content-marketing pages.
- Long-form, substantive content beats thin coverage. A 3,000-word methodology piece with a real argument and original data outranks a 600-word affiliate roundup in Claude's source selection, even when both are technically eligible.
One pilot result worth dwelling on: Claude named 1Digital® on 4 of 6 agency-comparison prompts. The average peer agency surfaced on roughly 1 of 6. That is a 4× gap on the same prompt set, in a vertical that is theoretically competitive. The hypothesis that fits the data: Claude rewards depth. Agencies (and brands) with substantive published methodology and real first-party data are surfaced; agencies that compete on volume of thin content are not.
The optimization lever for Claude is first-party depth. White papers, methodology pages, original-research content, named-expert authorship. Slow to build, slow to lose. See Claude AI SEO services for the full playbook.
Engine 3: Perplexity Sonar
Perplexity is the engine whose citation mechanic is closest to a traditional search engine — and that is a feature, not a bug. Sonar is a retrieval-augmented model: it fetches a fresh set of documents at query time, ranks them, and grounds its answer in the top-ranked subset. The transparency is structural.
The citation mechanic:
- 5–8 named citations per answer on average. The model surfaces the sources directly in-line, with link chips that the user can click.
- Sonar dereferences canonicals aggressively. Variant URLs that canonicalize to a master PDP get rolled up into the master.
- The retrieval layer reads on-domain structure clean. Pages with structured comparison data, FAQ blocks, and clean Schema.org markup get paraphrased verbatim more often than equivalent unstructured pages.
- 86% mention rate in our pilot, $0.022 per response, 5–8 citations per answer. The most-attributable AI surface in the market.
Perplexity's signal weighting is the most "SEO-like" of the major engines: topical authority, on-domain content depth, schema cleanliness, and link equity all carry weight. The engine-specific addition is sensitivity to passage-level extractability — paragraphs that read like definitions get cited, paragraphs that read like marketing copy do not.
The full Perplexity playbook is on our Perplexity AI SEO services page, and the buy-flow specifics on Perplexity shopping optimization.
Engine 4: Google Gemini / AI Overviews
Gemini 2.5 Pro powers Google AI Overviews. The engine is sourced from Google's main web index — not from a separate AI-specific crawl — and its job, on shopping-intent queries, is to render the category-level answer, not to recommend specific brands.
The citation mechanic:
- Passages are extracted from already-ranking organic pages. AI Overviews does not invent new winners; it elevates passages from the top organic results.
- The cite-worthy passage is the unit, not the page. A page ranking #4 with one exceptional paragraph can win the cite while the #1 page does not.
- Schema validity correlates strongly with citation-chip appearance. We don't claim it's a direct ranking factor; we observe that pages with zero schema errors and complete recommended fields are cited at higher rates.
- AI Overviews suppresses brand-recommendation behavior on shopping queries. Our pilot returned 0% brand mention rate from AI Overviews on the same prompt panel that produced 86% from Perplexity. The engine is rendering categories, not brands.
The optimization target on Gemini / AI Overviews is owning the category-defining answer block. Be the source whose paragraph the engine paraphrases when explaining the category. The brand-recall play happens on Perplexity, Claude, and ChatGPT — not here. See Gemini AI SEO services and Google AI Overviews optimization for the full playbook.
Engine 5: the long tail (Grok, Copilot, agent surfaces)
The long-tail engines — xAI's Grok, Microsoft Copilot, the proliferating agentic surfaces that wrap one of the major foundation models — generally inherit the citation posture of whatever model they're built on. Copilot's behavior is closer to ChatGPT's. Grok's behavior is its own thing (more aggressive on social-source recall, less consistent on attribution). The agent surfaces are increasingly governed by MCP for tool access, which moves the question from "what does the engine cite" to "what does the agent fetch directly."
For most brands, the priority order in 2026 is: Perplexity, ChatGPT, Claude, Gemini/AI Overviews, then the long tail. A program that optimizes for the first three or four is well-positioned for whatever comes next; a program that chases every emerging engine spreads thin.
What this means for your program
The cross-engine reality:
- No single signal wins everywhere. A brand with strong on-domain structure wins Perplexity; with strong primary-source content wins Claude; with strong cross-domain mention density wins ChatGPT; with strong category-passage clarity wins AI Overviews. The fully-optimized program does all four.
- The shared substrate matters more than the engine-specific tactic. Pages with valid schema, clean technical SEO, real authorship, and topical depth are eligible everywhere. Engine-specific optimization is the layer on top, not the foundation.
- Diagnosis is engine-specific. If you're losing Perplexity, look at on-domain structure and editorial corroboration. If you're losing Claude, look at first-party content depth. If you're losing ChatGPT, look at cross-domain mention density. The fixes are not interchangeable.
- Measurement is multi-dimensional. Mention rate, citation rate, citation share, and cross-engine spread are distinct KPIs. A single "AI SEO score" hides the data you actually need to act on.
Common questions
Can I just optimize for the foundation model and let the engines follow?
Not really. The same underlying Sonnet 4.6 model behaves differently inside Perplexity (with its retrieval layer), Claude.ai (with native web search), and an embedded agent (with MCP tool access). The engine layer matters. Optimizing for the model in isolation doesn't generalize to the surface.
Why is there so little public information about how citations work?
The engine teams treat citation systems as part of their competitive moat. Public documentation is sparse and often outdated within weeks of being written. The honest path forward is instrumentation: run a fixed prompt panel across the engines, measure what changes, and let the data tell you what's working.
How long until ranking systems change so much that this is all wrong?
Some of it will be wrong within six months. The shared substrate — schema, authority, content depth, off-domain corroboration — is durable on a 2–3 year horizon because it tracks the underlying truth (the engine cites sources it trusts). The engine-specific tactics shift faster. A working program treats the substrate as the long-cycle investment and the engine-specific layer as the iterative work.
Key takeaways
- Five major AI engines, five different citation mechanics. There is no single "AI ranking algorithm."
- ChatGPT: paraphrase recall, rewards cross-domain mention density.
- Claude: substance over coverage, rewards primary-source depth and named authorship.
- Perplexity: retrieval-grounded, rewards on-domain structure and clean schema. Highest mention rate at 86% in our pilot.
- Gemini / AI Overviews: passage extraction from already-ranking pages, category-answer posture rather than brand-recommendation.
- The substrate is shared: schema, authority, content depth, off-domain corroboration. Engine-specific optimization is the layer on top.
For a citation-share program that instruments all five engines weekly against your priority queries with diagnostics tied to specific signals, see our citation-share monitoring methodology or start a conversation.
