Measuring AI visibility: AIRS explained.
One number for whether AI assistants name your brand when buyers ask — and the statistical work that makes that number worth reading.
One number, on purpose.
AI Recommendation Share is the fraction of eligible buyer-intent prompts in your category where a model recommends your brand by name. It is expressed as a percentage.
AIRS = recommended / eligible × 100Why a single number? Because growth teams need a tracking number. SEO had Domain Authority; AI visibility needs an equivalent. Per-platform breakdowns, weighted indexes, sentiment-adjusted shares — they all decompose into something that looks like AIRS plus annotations. The single number forces honesty about confidence, which is the part most teams get wrong.
Recommended, not mentioned.
The denominator is eligible: a prompt that falls in your category and where the model returns a substantive answer. The numerator is recommended: the model named your brand as one of its top suggestions, not merely in passing.
That distinction matters. “Has a CRM that integrates with Salesforce — including HubSpot, Pipedrive, Acme, and others” is a mention. “For mid-market teams, I'd suggest Acme, HubSpot, or Pipedrive” is a recommendation. We classify with an AI judge graded 0–10, then take everything above a threshold as recommended. The judge score is internal — it disciplines AIRS, not a number you read on its own.
A mention is not a recommendation. The metric that doesn't distinguish them is decoration.
Run it more than once.
Language models are temperature-sampled. A single run is a single draw. The most common mistake we see in DIY tracking is reading one prompt-one-day output and treating it as truth.
k = 3–5 runs per prompt per model per windowAveraging k runs collapses variance that would otherwise masquerade as a real change, and keeps the denominator honest. If you watch AIRS jump from 28% to 41% on Monday and back to 31% on Tuesday, the problem isn't your strategy — it's your sampling.
Read the interval, not the point.
A 24% AIRS at n=8 is not the same signal as 24% at n=200. We bound AIRS with a Wilson score interval, which behaves correctly near 0 and 1 and on small samples.
lower = (p̂ + z²/2n − z·√((p̂(1−p̂) + z²/4n)/n)) / (1 + z²/n)
upper = (p̂ + z²/2n + z·√((p̂(1−p̂) + z²/4n)/n)) / (1 + z²/n)The width of that interval is your confidence label. We translate it into three buckets — High (confidence ≥ 0.7), Medium (≥ 0.4), Low (< 0.4) — so a number always travels with a header that says whether it's worth acting on. A Low-confidence 60% is less actionable than a High-confidence 35%. Always read the band.
Test the change, don't eyeball it.
Week-over-week, the question isn't “did the average move” — it's “did the model change its mind on the same prompts.” That's a paired comparison, not a two-sample one.
χ² = (|b − c| − 1)² / (b + c)
p-value = 1 − erf(√(χ²/2))We run McNemar's paired test with Edwards continuity correction. b is the count of prompts that recommended you last week but not this week. c is the reverse. Alerts fire only when p < 0.05. Otherwise it's noise wearing a costume.
The three failure modes.
When AIRS is below where it should be, it's usually one of three patterns:
Outframed.The model knows you exist but anchors the category around someone else. You appear in “alternatives to X” prompts but never on the “best X” question. Fix the category-defining content first.
Authority deficit.The model can't find enough credible sources naming you in the category. Common on Perplexity, which leans hardest on cited URLs. Fix the citation graph: review sites, listicles, expert content, partner pages.
Branded-only.AIRS jumps when the prompt names you, and collapses when it doesn't. You're recognized but not recommended. This is a positioning problem disguised as a visibility problem.
The point of measurement is not false precision. The point is knowing when the narrative has actually moved.
A real example.
A B2B analytics SaaS we track went from 22% to 31% AIRS in three weeks after a comparison-page rewrite. The headline movement looked great. The McNemar paired test said p = 0.12 — not significant. The confidence interval at n=120 was still ±6 points wide. We didn't fire the alert.
Two weeks later, after a second wave of changes, AIRS held at 31% with a tighter band and p = 0.03. That was the real signal. The first move was within noise. Treating it as a win would have led to ship-celebrating noise.