A/B Test Sample Size Calculator
Calculate how many users you need for statistically valid A/B tests, with context on business trade-offs and practical considerations.
Test Parameters
Reasonable effect size for most business metrics
Standard choice - balances detection ability with sample size
Standard - widely accepted balance
Tests for any difference (recommended)
Results
Required Sample Size per Group
0
users per variant
0
days needed
0
total users
What This Means
With 0 users per group, you'll have 80% probability of detecting a 20% relative improvement
Your false positive rate will be 5% (declaring a winner when there's no real difference)
Expected new conversion rate: 6.0% (up from 5.0%)
Business Context & Trade-offs
⚡When to Run With Less Confidence
- •Reversible changes (UI tweaks, copy changes)
- •High opportunity cost of waiting
- •Clear business logic supports the change
- •Multiple metrics show directional improvement
🛡️When to Demand High Confidence
- •Irreversible changes (pricing, algorithms)
- •High implementation cost
- •Risk of negative brand impact
- •Regulatory or compliance implications
⚠️ Important Considerations
- • This calculator assumes a fixed sample size test. Don't peek at results early without using sequential testing methods.
- • Real-world effects are often smaller than expected. Be conservative with your minimum detectable effect.
- • Consider running a pilot test to validate assumptions about baseline rates and variance.
- • Multiple testing (looking at many metrics) increases false positive risk - adjust accordingly.
Under the Hood
Sample size per group (n) =
(Zα + Zβ)2 × (p1(1-p1) + p2(1-p2)) / (p2 - p1)2
Where:
• p1 = baseline conversion rate = 5.00%
• p2 = expected conversion rate = 6.00%
• Zα = z-score for α/2 = 1.960
• Zβ = z-score for power = 0.842
Quick Verification
Common sample sizes for reference (two-tailed test, 80% power, 5% significance):
5% → 6% (+20% lift)
~3,842 per group
10% → 11% (+10% lift)
~14,751 per group
20% → 22% (+10% lift)
~11,737 per group
Note: Higher baseline rates require smaller sample sizes for the same relative lift