Calculator · 016
A/B Test Sample Size Calculator
Estimate the users per variant a test needs to detect your target effect — and decide whether the test is worth running at your current traffic.
Minimum detectable effect: the smallest absolute change in conversion rate you want the test to reliably detect.
Sample size per variant
—
AverageFormula
Sample size per variant ≈ 16 × p × (1 − p) / d²
Understanding A/B test sample size
Reference material — the calculator above stays the primary tool.
What sample size tells you
Sample size is the number of users each variant needs before a test can reliably tell a real effect from noise. It is set before the test runs, from two things: your baseline conversion rate and the smallest change worth detecting.
Running with too small a sample is the most common testing mistake — it produces confident-looking results that vanish on rollout. Sizing the test first is what makes the eventual verdict trustworthy.
How to read your result
Here, fewer users is better — an easier, faster test — read against common testing sizes:
Low — far above typical sizes; the effect or baseline makes a reliable test expensive. Average — near the common benchmark; runnable with planning. Strong — below typical sizes; the test is cheap to run reliably.
The inverse-square trap
The single most important property of this formula is that sample size scales with the inverse square of the detectable effect. Wanting to detect a 0.5-point change instead of 1.0 point does not double the sample — it quadruples it. This is why ambitious tests chasing small gains so often run out of traffic before they conclude.
Levers that change the requirement
Two inputs move the number. A larger minimum detectable effect shrinks the sample sharply, so test bold changes likely to move the needle, not cosmetic ones. A higher baseline rate also lowers the requirement. If neither can move, the lever shifts to time — run longer to accumulate the users.
This is an estimate, not a guarantee
The 16-times shortcut bundles standard 95% confidence and 80% power assumptions; a full power calculation can differ. Treat the result as the planning floor for users per variant, and pair it with the duration and significance tools to turn it into a timeline and a decision.