Field notes · product · July 13, 2026 · 7 min read

How the presence score works.

One number for whether AI assistants name your brand when buyers ask — and the statistical work that makes that number worth reading.

iSeer Team

AI visibility intelligence

One number, on purpose.

Presence is the fraction of eligible buyer-intent prompts in your category where a model names your brand at all. It is expressed as a percentage.

presence = mentioned / eligible × 100

Why a single number? Because growth teams need a tracking number. SEO had Domain Authority; AI visibility needs an equivalent. The single number forces honesty about confidence, which is the part most teams get wrong.

Presence first. Quality alongside it.

The denominator is eligible: a prompt that falls in your category and where the model returns a substantive answer. An API error is never counted as an absence — a failed call is missing data, not evidence that you vanished. The numerator is mentioned: the model named your brand in the answer.

That numerator is deliberately binary. Being listed alongside four competitors counts as present, and so does being the single suggested option. We considered weighting presence by how flatteringly you were framed, and rejected it: one number that silently blends how often with how well is a number nobody can act on.

Framing quality is measured, but it is reported next topresence rather than baked into it. An AI judge scores each mention 1–10 on how well you were positioned, and we surface the average across mentioned runs. So a brand that is named constantly but always as the budget afterthought shows up as high presence with low quality — two facts, not one blended score that hides both.

How often they name you, and how well they frame you, are different questions. A metric that blends them answers neither.

Run it more than once.

Language models are temperature-sampled. A single run is a single draw. Reading one prompt on one day and treating it as truth is the easiest way to fool yourself — the answer you screenshot may not be the answer your buyer gets an hour later.

k = 5 runs per question, per model, per weekly window

Five runs each on ChatGPT and Claude, every week. Questions whose answers have gone stable get sampled slightly less often (a floor of four) so the budget goes to the questions that are actually moving. Pooling those runs collapses the variance that would otherwise masquerade as a real change.

Read the interval, not the point.

A presence rate of 24% from 8 runs is not the same signal as 24% from 200. We bound presence with a Wilson score interval, which behaves correctly near 0 and 1 and on small samples.

lower = (p̂ + z²/2n − z·√((p̂(1−p̂) + z²/4n)/n)) / (1 + z²/n)
upper = (p̂ + z²/2n + z·√((p̂(1−p̂) + z²/4n)/n)) / (1 + z²/n)

The gate is blunt on purpose: below 12 valid runs in the window, we do not show you a range at all. Not a wide range, not a hedged one — none. A confidence interval computed from a handful of samples is a decoration that makes a guess look like a measurement, and we would rather show you nothing than that.

Test the change, don't eyeball it.

Week-over-week, the question isn't “did the average move” — it's “did the model change its mind on the same prompts.” That's a paired comparison, not a two-sample one.

χ² = (|b − c| − 1)² / (b + c)
p-value = 1 − erf(√(χ²/2))

We implement McNemar's paired test with Edwards continuity correction. b is the count of prompts that named you last week but not this week. c is the reverse. It is the right shape for the question, because the same prompt is being re-asked rather than a fresh sample drawn.

The alert we actually send is deliberately narrower than that test. An email goes out on one condition: a question you were established on — named in at least 60% of runs last window — drops to zero mentions this window, with at least eight valid runs on each side of the comparison. Nothing else emails you.

That is a high bar, and it is meant to be. An alerting system that cries wolf on ordinary sampling noise trains you to ignore it, at which point it is worse than having none. We would rather miss a marginal drop than spend your attention on a coin flip.

The three failure modes.

When presence is below where it should be, three patterns are worth checking first. These are diagnostic starting points, not measured frequencies — we are not claiming to know how often each one is the culprit:

Outframed.The model knows you exist but anchors the category around someone else. You appear in “alternatives to X” prompts but never on the “best X” question. Fix the category-defining content first.

Authority deficit.The model can't find enough credible sources naming you in the category. Fix the citation graph: review sites, listicles, expert content, partner pages. The models synthesise what everyone else wrote about you, not what you wrote about yourself.

Branded-only.Presence jumps when the prompt names you, and collapses when it doesn't. You're recognised but not reached for. This is a positioning problem disguised as a visibility problem.

The point of measurement is not false precision. The point is knowing when the narrative has actually moved.

Why the statistics matter.

Here is a real one. In our July 2026 seed of twenty categories, we measured LastPass at 73 out of 100 on ChatGPT and 6 on Claude — the same brand, the same buyer questions, the same week. Klaviyo came back at 0 on ChatGPT and 54 on Claude. If you had checked a single model, you would have walked away with a confident, precisely wrong picture in either direction.

Note what we are and are not claiming. We measured the gap; we did not measure the reason for it. We can show you that two models disagree sharply about a brand and by how much. We cannot tell you what is happening inside the weights, and anyone who says they can is guessing. Keeping those two statements apart is the whole discipline.

It is also why a zero gets treated as a suspect rather than a finding. A broken API key produces exactly the same number as genuine absence, so every batch we run starts by measuring a brand that is unambiguously well-known in its category. If that control comes back at zero, we throw the run away instead of publishing it.

Try it on your brand

See your presence score, free.

30 seconds. No signup. Buyer-intent prompts across ChatGPT and Claude, with confidence bands and failure-mode classification.

Keep reading

How the presence score works.

One number, on purpose.

Presence first. Quality alongside it.

Run it more than once.

Read the interval, not the point.

Test the change, don't eyeball it.

The three failure modes.

Why the statistics matter.

See your presence score, free.

See a live leaderboard.

Glossary.

AI SEO vs traditional SEO.

Methodology.