Real strategists. Real AI tools. Real growth. — 1Digital® since 2012
Workspace by 1Digital® — the agency platform we built. Coming to select agencies. Join the early-access list →
Workspace Methodology · Weekly Prompt Panels · 10+ Engines
How 1Digital®'s Workspace tracks citation share — the AEO equivalent of rank position — across ChatGPT, Perplexity, Gemini, Claude, Copilot, Grok, Google AI Overviews, and the engine variants under monitoring. The measurement loop that closes inside every engagement.
Citation share is to AEO what rank tracking is to classic SEO: the recurring measurement that proves the work moves the surface. The mechanics differ from SERP scraping, and the prompt-set discipline matters more than the tooling. This page is the methodology — what we run, what we measure, how the data closes back into content briefs inside the same retainer.
Trusted by 400+ Brands · Certified Partners
Of results, scale, and quality at the enterprise level.
Specialists across SEO, AI SEO, PPC, design, dev, and strategy.
US core team for clear communication; vetted global specialists for international client work.
Rated 4.9/5 across 941+ verified client reviews.
TL;DR: Weekly prompt panels (100–500 category-relevant prompts) executed through 10+ AI engines, with citation occurrence, position, and accuracy logged per prompt × engine × week. The aggregate — citation share — is the lagging KPI; share-of-voice within a defined competitor set is the share movement we manage. Drops generate content briefs inside the same retainer; the loop closes on re-measurement. Standard deliverable in every AI SEO engagement; not sold separately.
Where the prompt panel runs
ChatGPT Search + OAI-SearchBot retrieval. The live-grounded citation surface.
ChatGPT default mode using training-data memory. Measures GPTBot training-window exposure.
Default Perplexity + Pro on shopping prompts. Transparent citation pattern; easiest to log.
Gemini chat (gemini.google.com) plus AI Mode where the prompt class triggers it.
Same prompts run as Google searches; AI Overviews citation logged when the query triggers an Overview.
Claude.ai chat with web search on and off — both modes logged separately.
copilot.microsoft.com plus Copilot in Edge sidebar context where applicable.
grok.x.ai. Smaller distribution but distinct citation pattern (X-platform data bias); worth tracking in scope.
Distinct from Copilot — same Bing index, different UI surface and citation behavior.
Claude Projects ingestion tests, NotebookLM source-grounding tests, M365 Copilot tenant-context tests where client environment access allows.
What gets measured per prompt × engine
Binary per (prompt, engine): cited, paraphrased without credit, or absent entirely. The base measurement that aggregates to citation share — the percentage of prompts on which an engine cites the brand by name or links to one of its URLs.
Where in the answer the brand appears — first source cited, mid-answer mention, or footnote-style citation at the end. Position correlates with click-through rate and downstream brand recall. The same brand cited first by Perplexity and last by Gemini behaves differently in attribution.
Whether the engine's claim about the brand or product is factually correct — review counts, price ranges, feature descriptions, fit notes. Hallucinated or stale claims are themselves a signal: they tell us the engine's training-data view is outdated and which content updates would fix the inaccuracy.
Which competing brands appear in the citation set on the same prompts. Lets the dashboard show share-of-voice within a defined competitive set, not just absolute citation rate. Often the more actionable view — moving from #4 to #2 in citation share on category-defining prompts is what wins competitive ground.
How the data closes back into the work
100–500 category-relevant prompts execute through every engine in scope, on the same day each week, with results stored against the prompt × engine × week matrix. Same prompts week-over-week; the comparison is the signal.
Each engine's answer is parsed: brand mention extracted, URL cited extracted, claim accuracy scored against a maintained reference (the brand's own data). Manually reviewed for any ambiguous cases.
Citation share by engine, by prompt category, by week. Movement flagged when share crosses defined thresholds — a brand dropping below 10% citation share on a category-defining prompt category triggers a content review.
Drops in citation share generate a brief identifying the pages that need rewrites, the schema gaps, or the editorial-citation work to commission. Briefs go to the production team inside the same retainer.
After content ships and crawlers re-pick up the change, the same prompts re-run and the citation share is logged again. The before/after delta is what we report — and what proves the work moved the surface.
The loop closes inside one retainer. The measurement is the engagement's feedback signal, not a separate product — and not a vanity dashboard for the client portal. Drops generate briefs; briefs ship content; content moves citation share; the dashboard proves it.
Inside the system
The 1Digital® Workspace AI Visibility Monitor runs five live AI engines through one shared parser and one shared rollup. Engine-specific quirks are abstracted at the runner layer; the parsed-response schema is identical across providers, which is what makes per-engine numbers directly comparable.
Model: gpt-4o
Browsing-enabled chat. The widest distribution surface in the panel; produces narrative-form brand mentions more often than discrete citations.
Model: claude-sonnet-4-6
Claude with web-search enabled. Highest latency in the panel and the most variable reliability — web-search timeouts are a known operational variable and the Workspace logs them separately from completion errors.
Model: sonar-pro
The most transparent citation behavior of any engine — explicit numbered footnotes, named source domains, reproducible citation patterns. The easiest surface in the panel to parse and the highest brand-mention rate in production data.
Model: gemini-2.5-pro
Gemini 2.5 Pro chat. Category-generality-heavy answer style; first-mention character position averages 4,380 characters into the answer text, meaning when brands appear they appear late.
Model: serpapi-google
Google AI Overviews retrieved via SerpAPI. The only non-chat surface in the panel; statuses include no_ai_overview when a prompt didn't trigger an Overview. Distinct optimization problem from chat-based engines.
The five engines above are the active production panel. Engine variants under continuous monitoring (Microsoft Copilot, xAI Grok, Bing Generative Search, Claude Projects, NotebookLM, M365 Copilot tenant-context) are tested into the parser and queued for promotion into the weekly cron as their APIs stabilize.
Prompt taxonomy
Every prompt in the panel is classified by intent and tagged with a source. Intent drives how we read the engine's behavior; source drives how we trust the prompt itself.
informational
“What is X?” “How does Y work?” — category-explainer prompts. The Workspace classifies these as informational because they test the engine's training-data understanding rather than recommendation behavior. Engines lean toward generic explanation here; brands win these prompts by owning the category-defining answer block in their own content.
commercial
“Best X for Y,” “Is X worth it,” “Top X under $N” — buyer-intent recommendation prompts. The Workspace's commercial bucket is where mention rate actually diverges across engines: Perplexity names brands aggressively, AI Overviews stays category-level, OpenAI uses narrative paraphrase.
comparison
“X vs Y,” “X or Y for Z,” “How does X compare to Y” — head-to-head prompts. The structured-data prompt class: engines visibly reach for clean comparison tables and schema-marked feature lists. Brands with structured comparison content punch above their domain-authority weight in this bucket.
source)manual
Strategist-authored prompts entered through the Workspace's Prompts tab. The fully-curated set; used for category-defining synthesis prompts that no automation would generate.
ai_seeded
AI-generated prompt candidates produced by feeding selected service URLs into Claude, which returns ~80 buyer-intent prompt variations. Strategist curates the list before bulk insert. Every prompt is reviewed; nothing auto-enters the panel.
keyword_derived
Prompts derived from the brand's tracked-keyword set, wrapped in natural-language buyer-intent variations by Claude. The strongest signal-to-noise prompt source on commercial-intent panels; the keyword set is already validated as revenue-relevant before it becomes a prompt.
Run cadence
kind = weekly
A scheduled cron job kicks off the weekly run every Monday at the same UTC time. It enumerates all active prompts, posts them through every enabled provider, parses the responses, writes the parsed payloads to ai_visibility_responses, and then triggers recomputeWeekRollup to materialize the week's aggregates. Same prompts, same engines, same time — the week-over-week diff is the signal.
kind = on_demand
A strategist triggers an on-demand run from the Workspace UI — typically scoped to a subset of prompts after a content change ships, to test whether the new page changed the engine's citation behavior before the next weekly cron. The run is logged identically to the weekly run (same status fields, same rollup invocation) so the data is comparable.
Both kinds write to ai_visibility_runs with a status that progresses pending → running → success / partial / error. The runner is idempotent — re-running a partial run continues from processed_prompt_ids rather than starting over, so transient provider outages don't burn the full prompt panel's inference budget.
From raw response to parsed payload
The runner posts the prompt to each enabled provider's API (or in AI Overviews' case, SerpAPI's Google endpoint). Latency_ms, model name, raw response, and cost_estimate_usd are captured per call into the ai_visibility_responses row. Status is one of success / error / no_ai_overview.
Each subject and competitor row carries a name plus an array of aliases plus an array of domains. Aliases handle the common-misspelling, abbreviation, and product-line-name cases (one brand often has three legitimate ways its name appears in an answer). The parser scans the answer text for any alias hit and the citation list for any domain hit; each match is logged with the matched_via array so a human reviewer can audit which alias / which domain produced the match.
For each matched entity, the parser records entity_id, entity_kind (subject vs competitor), count, first_position (character offset of first occurrence), and matched_via. First_position is the source of the 'average answer-depth' analysis — a brand mentioned at character 200 behaves differently in reader attention than the same brand mentioned at character 4,000.
The engine's citation list (where present — Perplexity always has one; OpenAI's browsing mode sometimes does; Claude's web-search mode sometimes does; Google AI Overviews always does; Gemini's varies) is iterated. For each citation, the parser extracts citation_index, the URL, and the parsed domain. If the domain matches a monitored subject or competitor's domain array, an entry is added to parsed.citations with citation_index preserved so we can compute average citation position.
Brand names and domains that appear in the answer or citation list but don't match any monitored entity are logged to parsed.discovered_brands and parsed.discovered_domains with frequency counts. These are the engines telling you who they think your competitive set is — a different, often more honest list than the one a brand declares about itself. Stoplisted terms (system-level: 'SEO,' 'Amazon,' 'reddit.com,' etc.) are filtered before write so the discovered panel surfaces only candidate competitors, not platform noise.
The parsed payload conforms to a typed shape: { mentions[], citations[], discovered_brands[], discovered_domains[] }. Every downstream view — the leaderboard, the response viewer, the rollup table — reads against this schema, which is why a single parser change propagates consistently across the entire dashboard.
Aggregation
The recomputeWeekRollup function is the materialization step: parsed responses get aggregated into ai_visibility_weekly_rollup rows so dashboard reads stay fast.
The rollup function (recomputeWeekRollup) takes a week_start (Monday, UTC), pulls every success-status response in the week, and groups them by entity × provider. The week boundary is deterministic regardless of the operator's timezone — Mondays at 00:00 UTC — so week-over-week comparisons are stable.
Each entity (subject or competitor) gets a row per provider plus an 'all providers' row (provider_id = NULL). The 'all' row lets the dashboard show overall citation share across the full panel without recomputing on every read; the per-provider rows let it drill into 'which engine moved this week.' Both are upserted in one batch with a unique constraint on (week_start, entity_id, entity_kind, provider_id) NULLS NOT DISTINCT, so the aggregate row is upsertable in the same conflict-resolution pass as the per-provider rows.
The rollup uses Set<prompt_id> for prompts_total, prompts_with_mention, and prompts_with_citation. A prompt that ran twice in the same week (re-runs, on-demand backfills) counts once. The position_sum tracks total character-position summed across mentions; position_count is the divisor for avg_position. Citation density isn't materialized into the rollup table (yet) — it's computed on the fly from the responses table when needed for the report.
mention_share = prompts_with_mention / prompts_total; citation_share = prompts_with_citation / prompts_total; avg_position = position_sum / position_count (NULL if no mentions in the bucket). These are computed once at rollup time and cached in the table, so dashboard reads are O(1) per entity-week-provider tuple.
What every API call costs
Every provider response writes a cost_estimate_usd value based on the provider's published API pricing applied to the recorded token counts (or, for SerpAPI AI Overviews, the per-search cost). The estimate is conservative — it doesn't include retrieval-layer overhead, only the inference + search-API charge — so the published dashboard cost is a floor, not a ceiling.
Each provider row carries a running monthly_cost_estimate_usd that aggregates response-level costs. The Workspace's settings page exposes a monthly cost ceiling per provider; if a provider's running cost crosses the ceiling, the weekly cron pauses that provider until the next month or a manual ceiling bump. This is how the panel scales without runaway inference spend.
Dividing aggregate cost_estimate_usd by aggregate prompts_with_citation for an engine gives the cost-per-citation captured — the efficiency view. In the Q2 2026 pilot window, Perplexity led on this metric by a wide margin; AI Overviews was second on raw cost-per-response but produced 0 brand citations, making its cost-per-citation undefined (division-by-zero in the most literal sense).
ai_visibility_runs.stats.total_cost_usd is the run-level rollup of every response in the run. Visible in the run-status UI; useful for “how much did this on-demand backfill cost?” questions that come up when a strategist runs a fresh prompt set to test a content change.
Cost transparency at the response level is what makes the panel viable to scale. Every per-engine number in the Q2 2026 report rolls up from response-level cost_estimate_usd values, which is why the published cost-per-response and cost-per-citation figures are precise to four decimal places and not vendor estimates.
What a strategist sees
The Workspace's admin UI surfaces the parsed-response data through five primary views — each tuned to a different diagnostic workflow.
Weekly citation share + mention share for every monitored entity (subject and competitor) ranked across the panel. Color-coded WoW deltas. The default Mondays-first view; the entry point to every diagnostic workflow.
Brands and domains the engines surfaced that aren't on the monitored list, with frequency counts. Each row has Promote (turn into a tracked competitor with one click) and Dismiss (add to the discovery stoplist) actions. This is how the panel's competitive set grows organically over time.
Drill into any prompt × engine × week to read the engine's actual answer text with mentions highlighted: green for subject, amber for competitor, gray for discovered. The citation list renders as a numbered footer with domain badges. Useful for “why did our citation share drop on this prompt?” — you read the answer and see whether the engine paraphrased the brand without crediting it, named a competitor where it used to name the brand, or got the brand's claim factually wrong.
When a red WoW delta crosses the dashboard's threshold, the Explain-with-AI button summarizes the change against the underlying response data — “Perplexity citation share dropped 14 points this week; the prompts that lost share were all comparison-intent and the engine cited [discovered competitor type] instead.” The AI doesn't generate the underlying data, only the narrative summary of what the data shows.
On-demand panel re-run, scoped to a subset of prompts or the full set. Used after content ships to test whether the new page changed the engine's citation behavior on the relevant prompts before the next weekly cron.
Client-engagement dashboards expose a curated subset of these views — typically the leaderboard, response viewer, and discovered panel scoped to the client's prompt set. The full admin surface is internal-only; the parser, schema, and rollup logic are identical across both contexts.
The methodology, applied
The schema and pipeline documented above generated the public Q2 2026 report — five engines, weekly cadence, aggregate-only findings. Every metric in the report (mention rate, citation rate, average first-mention position, cost per response, cross-engine pairwise overlap) rolls up directly from the columns described here. The methodology is the report's audit trail.
Read the Q2 2026 ReportCitation share monitoring is the AEO measurement methodology that runs a defined prompt panel through every major AI engine on a recurring cadence and logs, for each prompt × engine combination, whether the brand was cited, paraphrased without credit, or absent entirely. The aggregate result — “the percentage of category-relevant prompts on which engine X cites the brand” — is citation share, and it's the closest equivalent AEO has to “average rank position” in classic SEO.
Weekly, on a fixed day. The cadence matters because engines update behavior continuously — model refreshes, retrieval-stack tweaks, prompt-tuning changes — and a citation-share trend that's logged month-to-month rather than week-to-week misses the inflection points that explain why a brand's share moved. Same prompts, same engines, same day each week; the consistency is what makes the diff readable. Higher-frequency cadences (daily) buy little additional signal and cost meaningfully more inference budget.
A curated list of 100–500 prompts per engagement, drawn from three sources: (1) the brand's tracked-keyword set translated into natural-language buyer prompts, (2) category-defining synthesis prompts that drive shopping or research intent (“best X for Y,” “X vs Y,” “how does Z work,” “is X worth it for Y”), and (3) brand-recall and competitor prompts that test whether the engine knows what the brand is and what it competes against. The list is reviewed quarterly and rotated when query patterns shift in the category. Prompt-set discipline is the difference between a useful citation-share signal and noise — random one-off prompts run inconsistently produce nothing actionable.
Citation share by engine over time (line chart, weekly granularity, all engines stacked). Citation share by prompt category — “research,” “comparison,” “recommendation,” “brand-recall” — so movement is interpretable. Share-of-voice within a defined competitive set (which competitor brands show up in the same prompt's citation set, and at what frequency). Hallucination flags (prompts where the engine made an inaccurate claim about the brand; these are themselves a signal). And a queue of strategist-flagged prompts that need content briefs based on this week's movement.
Citation share drops on specific prompt categories generate content briefs inside the same retainer. A typical loop: prompts in the “X vs Y” category lost 12 points of Perplexity citation share over four weeks → strategist identifies that the comparison pages on the brand's site were paraphrased without credit because the page leads with marketing prose instead of a directly extractable comparison block → brief goes to the content team for an answer-first rewrite → page ships, crawlers re-pick up, prompts re-run in the next cycle → citation share recovers (or doesn't, and we iterate). The loop closes inside one retainer; clients don't buy “citation share data” as a separate product.
Citation share is to AEO what rank tracking is to classic SEO — the recurring measurement that proves the work moves the surface. The mechanics are different: rank tracking pulls SERP positions from search engines; citation share runs prompts through inference APIs and engine UIs and parses the answers. The discipline is similar: pick a stable prompt set, run it consistently, watch the trend, intervene on movement, prove the intervention worked by the next measurement. Brands that try to manage AEO without citation-share measurement are doing classic SEO without rank tracking — flying on intuition with no proof points.
Yes. Citation-share monitoring is a standard deliverable inside our AI SEO retainer, our AEO engagements, and our engine-specific programs (ChatGPT, Claude, Perplexity, Gemini, Copilot). Not a separate line item — the measurement is the engagement's feedback loop, not a product on its own.
The third-party tools (Profound, AthenaHQ, Otterly, Ahrefs Brand Radar, several others) are useful and improving fast; we evaluate and sometimes use them as part of the stack. The argument for our own Workspace running the measurement is consistency with the rest of the engagement — same prompt set the strategists work with, same dashboard the content team works against, same retainer that ships the fixes. Buying a third-party tool plus a separate agency creates two seams; running it in-house removes them. We'll discuss third-party tooling integration on engagements where the client already has license investment to amortize.
Per-engagement, yes — every client sees their own dashboard. We don't publish category-aggregate or industry-benchmark numbers because the prompt sets vary by engagement, and a published benchmark would create false precision (comparing your citation share to an “industry average” computed on different prompts is meaningless). The honest answer to “what's a good citation share?” is “higher than it was last week against the same prompts and same competitors, in the categories that drive your revenue.” That's what we measure and report.
Citation-share monitoring runs inside every AI SEO engagement. Weekly cadence, 10+ engines, content briefs generated from movement. 941+ verified reviews · 4.9/5.