A/B Test Sample Size Calculator

Calculate how many users you need for statistically valid A/B tests, with context on business trade-offs and practical considerations.

Test Parameters

Baseline Conversion Rate(current rate)

5.0%

Minimum Detectable Effect(relative improvement)

+20%

Reasonable effect size for most business metrics

Statistical Power(1 - β)

Standard choice - balances detection ability with sample size

Significance Level(α)

Standard - widely accepted balance

Test Type

Tests for any difference (recommended)

Daily Traffic per Group

Results

Required Sample Size per Group

users per variant

days needed

total users

What This Means

✓

With 0 users per group, you'll have 80% probability of detecting a 20% relative improvement

✓

Your false positive rate will be 5% (declaring a winner when there's no real difference)

✓

Expected new conversion rate: 6.0% (up from 5.0%)

Business Context & Trade-offs

⚡When to Run With Less Confidence

•Reversible changes (UI tweaks, copy changes)
•High opportunity cost of waiting
•Clear business logic supports the change
•Multiple metrics show directional improvement

🛡️When to Demand High Confidence

•Irreversible changes (pricing, algorithms)
•High implementation cost
•Risk of negative brand impact
•Regulatory or compliance implications

⚠️ Important Considerations

• This calculator assumes a fixed sample size test. Don't peek at results early without using sequential testing methods.
• Real-world effects are often smaller than expected. Be conservative with your minimum detectable effect.
• Consider running a pilot test to validate assumptions about baseline rates and variance.
• Multiple testing (looking at many metrics) increases false positive risk - adjust accordingly.

Under the Hood

Sample size per group (n) =

(Z_α + Z_β)² × (p₁(1-p₁) + p₂(1-p₂)) / (p₂ - p₁)²

Where:

• p₁ = baseline conversion rate = 5.00%

• p₂ = expected conversion rate = 6.00%

• Z_α = z-score for α/2 = 1.960

• Z_β = z-score for power = 0.842

Quick Verification

Common sample sizes for reference (two-tailed test, 80% power, 5% significance):

5% → 6% (+20% lift)

~3,842 per group

10% → 11% (+10% lift)

~14,751 per group

20% → 22% (+10% lift)

~11,737 per group

Note: Higher baseline rates require smaller sample sizes for the same relative lift