Calculator · 017
A/B Test Significance Calculator
Measure the statistical confidence behind an A/B result — and decide whether the difference is real enough to ship or still too noisy.
Statistical confidence
—
AverageFormula
Two-proportion z-test on A vs B conversion rates
Understanding statistical significance
Reference material — the calculator above stays the primary tool.
What significance measures
Statistical significance is the confidence that a measured difference between two variants reflects a real effect rather than random variation. It answers the only question that should gate a rollout: is this difference trustworthy enough to act on?
This calculator runs a two-proportion z-test — the standard method for comparing two conversion rates — and reports the resulting confidence directly.
How to read your result
Higher confidence is better, against the conventional 95% decision bar:
Low — under 90%; the difference is well within what chance could produce. Average — 90–95%; suggestive but not yet decision-grade. Strong — 95% or above; reliable enough to ship.
Significance is not lift
The two are independent. A variant can post a large observed lift yet sit below 95% confidence because the sample is thin, or post a modest lift that is highly significant on a large sample. Significance tells you whether to believe the result; the conversion lift tool tells you how big it is. Ship on both.
Why not stop the moment it crosses 95%
Peeking and stopping at the first significant moment inflates false positives, because confidence wobbles as data arrives. Decide the sample size in advance, run to it, then read significance once — the sample-size and duration tools set that plan so this verdict is honest.
One-tailed vs two-tailed
This calculator uses a two-tailed test, which asks whether the variants differ in either direction — the conservative, default choice for most experiments. A one-tailed test assumes you only care about improvement and reaches significance sooner, at the cost of missing a variant that quietly performs worse.