AI Overview citation patterns: what actually gets quoted, and why

A year into Google's AI Overview (formerly SGE) being live on most informational queries, the citation patterns are clear enough to make some concrete recommendations. This post is what I see when I pull AI Overview citation data: which pages get cited disproportionately to their ranking position, what structures correlate with citation, and what to actually do.

This isn't theory. It's compiled from running citation audits across about a dozen client sites over the past year. The patterns are consistent enough that I'd bet on them.

The base rate

First, the brutal honest baseline: AI Overview suppresses organic CTR by roughly 30-40% on queries where it shows. The exact suppression varies by intent — heavier on pure-informational queries, lighter on commercial. Use the CTR Predictor to model your specific scenario.

This means the citation slot inside the AI Overview matters more than ranking #2 organic. Being cited puts you above the AI block's fold; ranking organic #2 puts you below it, fighting for the ~5% CTR scraps.

The citation candidate pool

AI Overview almost never cites pages that aren't already ranking organically. The candidate pool is usually positions 1-10, occasionally 11-20 for thinly-contested queries. Translation:

You can't AI-Overview-optimise your way to citations from a page that isn't ranking. Classic SEO first. AI optimisation is what re-ranks you inside the citation pool.

What gets cited more than its position predicts

Across my citation audits, four page-structure patterns correlate with cite rate above what their organic position alone would predict:

1. Answer-first paragraphs

The cited paragraph almost always answers the question in its first 1-2 sentences, then provides nuance. Example of what gets cited:

A meta description should be between 130 and 160 characters, depending on letter widths. Google measures the cutoff in pixels (~920px on desktop), not characters.

And what doesn't:

Meta descriptions are an important part of on-page SEO. Many factors affect how Google displays them. The general consensus among SEO professionals is that they should be a certain length, though this varies...

Same eventual content. The extractor needs the answer in sentence one. Burying it loses the citation.

2. Concrete, falsifiable numbers

Pages with specific numbers cite more than pages with vague claims. The model can verify "130-160 characters" against other sources; it can't verify "the right length."

Patterns that suppress citation:

"many"
"often"
"typically"
"industry experts agree"

Patterns that earn citation:

Specific numbers
Named studies ("Ahrefs 2024 analysis of 4M pages")
Specific dates ("as of Q1 2026")
Comparison tables with named alternatives

3. Contrarian or differentiated takes

When 50 pages say the same thing, AI Overview cites a few — usually the most authoritative. When one page says something different and credible, that page gets disproportionate citation share because it adds information the model can't get elsewhere.

A 1500-word post arguing against a common SEO trope tends to outperform a 5000-word post restating the trope. The model prefers signal over consensus padding.

This is a meaningful structural advantage for small sites with strong opinions over big sites that hedge.

4. Question-shaped headings + answer-first paragraphs

The combination matters more than either alone:

<h2>How long should a meta description be?</h2>
First paragraph after: direct answer in 2 sentences
Subsequent paragraphs: evidence

The H2 lexically matches user queries. The first paragraph extracts cleanly. The structure is unambiguous about which paragraph answers which question.

Essay-style content (no question H2s, answers buried in paragraph 3 of a section) cites poorly even when its information quality is identical.

What doesn't move citation rate

Things I see SEO posts claim matter that I haven't seen meaningful evidence for:

Word count. Cited paragraphs are often 50-150 words. The page's overall length matters for ranking, not for citation.
Number of internal links on the page. Useful for ranking; doesn't seem to affect citation.
Bolding key phrases. Plausible-sounding, no observable effect.
Schema markup beyond Article and FAQPage. Helpful at the margin. Not the decisive factor most posts make it out to be.
Mentioning "as of 2026" in every paragraph. The model can read dates from your byline schema.

Don't sacrifice content quality to chase any of these.

The brutal honest gap: authority

The largest factor in citation rate, after position, is something you can't easily fake: domain authority. Known-publisher sites (Wikipedia, NYT, well-established niche sites with strong link profiles) cite at much higher rates than newer or weaker domains, even when the new-domain page is better written.

The same content on a 6-year-old SEO domain with 5000 referring domains will cite more than on a 6-month-old domain. There's no AI-optimisation tactic that closes this gap. Build the site's authority the long way: real content, real links earned by being citable, real signals.

The flip side: once you have authority, citation is largely a structural-optimisation game. Most enterprise SEO teams I see have plenty of authority and terrible page structure. They're leaving citations on the table.

A pre-publish checklist for citable content

For each new informational page:

Answer-first paragraph under H1. First 1-2 sentences answer the page's central question explicitly.
Question-shaped H2s where natural. Match phrasings users actually type.
Answer-first paragraphs under each H2. Direct answer, then evidence.
One specific number per section. "$129" beats "around a hundred dollars." "62% in a 2024 study" beats "most users."
At least one table or list. Tables survive extraction with their structure intact; paragraphs lose structure.
One genuinely original or contrarian point. What does your page say that's not on the other 20 pages targeting this query?
Article and FAQPage JSON-LD where applicable.
Author byline with Person schema for E-E-A-T signal.

What we don't know

Honesty section: nobody outside Google knows the AI Overview citation algorithm exactly. The patterns above are correlations from observation, not confirmed causation. The system updates frequently and visibly shifts.

Two safe bets for the longer term:

Specific, original, well-evidenced content has won every algorithm change of the past decade. Likely to keep winning.
Optimisation that essentially means "write more clearly" — answer-first, structured Q&A, sourced numbers, tables where they help — is unlikely to backfire.

The tactics that look like keyword-stuffing for AI ("As an AI language model..." prompts inserted into your content, etc.) will age the same way doorway pages did. Don't.

Tools

AI Snippet Writer — generates titles and meta descriptions tuned for both classic SERP and AI-Overview extraction. Concrete, direct, intent-matched.
CTR Predictor — model the impact of AI Overview presence on your specific position and traffic.
Snippet Optimizer — your snippet is the model's first impression of your page; cleaner snippets correlate with higher citation rates.
AI Overview glossary entry — quick reference.

AI Overview isn't going away. Treat it like a new ranking system overlaid on the existing one. Your goal isn't to game it — it's to write content that's genuinely better at being cited because it's better at being read.