0

I am trying to automate A/B-test results using GCP datalab. I am using both https://abtestguide.com/calc/ and https://www.surveymonkey.com/mp/ab-testing-significance-calculator/ as examples. I can easily calculate significance levels, Z-scores, uplift, etc. however, I cannot seem to get the statistical power right. I am using TTestIndPower from statsmodels.stats.power as follows:

effect_size = (mu_B - mu_A) / ((n_A * np.sqrt(var_A) + n_B * np.sqrt(var_B)) / (n_A + n_B))
nobs1 = n_A
ratio = n_B / n_A
alpha = 0.05

power = TTestIndPower().solve_power(effect_size = effect_size, 
                                    nobs1 = nobs1, 
                                    ratio = ratio,
                                    power = None,
                                    alpha = alpha,
                                    alternative='two-sided')

The effect size appears to be the problem, as using power from the examples I can get all other inputs right, apart from effect_size. Statsmodels states effect size to be "difference between the two means divided by the standard deviation"

# Sample sizes
n_A = 80000
n_B = 80000

# Means 
mu_A = 1600 / n_A
mu_B = 1696 / n_B

# Variances
var_A = mu_A * (1-mu_A)
var_B = mu_B * (1-mu_B)

# effect size as two means devided by pooled sd
(mu_B - mu_A) / ((n_A * np.sqrt(var_A) + n_B * np.sqrt(var_B)) / (n_A + n_B))

# or ..
(mu_B - mu_A) / np.sqrt((var_A + var_B) /2)

Should result in a power of 67.41%, I rather get 39.4%

vincentp
  • 31
  • 6
  • 1
    I have no idea what those web site calculators are doing. If I use https://www.stat.ubc.ca/~rollin/stats/ssize/n2.html and put the means 0.02 and 0.0212 and std=0.14, then I get a power of 0.4, same as statsmodels at rounding. – Josef Jun 18 '19 at 20:20
  • Also the websites state "The test result is not significant." when the p-value is smaller than 0.05 even though they use a significance level of 0.05. – Josef Jun 18 '19 at 20:49

0 Answers0