I am trying to automate A/B-test results using GCP datalab. I am using both https://abtestguide.com/calc/ and https://www.surveymonkey.com/mp/ab-testing-significance-calculator/ as examples. I can easily calculate significance levels, Z-scores, uplift, etc. however, I cannot seem to get the statistical power right. I am using TTestIndPower from statsmodels.stats.power as follows:
effect_size = (mu_B - mu_A) / ((n_A * np.sqrt(var_A) + n_B * np.sqrt(var_B)) / (n_A + n_B))
nobs1 = n_A
ratio = n_B / n_A
alpha = 0.05
power = TTestIndPower().solve_power(effect_size = effect_size,
nobs1 = nobs1,
ratio = ratio,
power = None,
alpha = alpha,
alternative='two-sided')
The effect size appears to be the problem, as using power from the examples I can get all other inputs right, apart from effect_size. Statsmodels states effect size to be "difference between the two means divided by the standard deviation"
# Sample sizes
n_A = 80000
n_B = 80000
# Means
mu_A = 1600 / n_A
mu_B = 1696 / n_B
# Variances
var_A = mu_A * (1-mu_A)
var_B = mu_B * (1-mu_B)
# effect size as two means devided by pooled sd
(mu_B - mu_A) / ((n_A * np.sqrt(var_A) + n_B * np.sqrt(var_B)) / (n_A + n_B))
# or ..
(mu_B - mu_A) / np.sqrt((var_A + var_B) /2)
Should result in a power of 67.41%, I rather get 39.4%