0

I am using the power.prop.test function for A/B testing. Based on the amount of impressions per group in an A/B test, what would would be a statistically significant lift?

The control group would have a proportion of 0.004.

If I run the following code:

power.prop.test(
               n= 6289195
               p1=0.004, 
               power=0.8, 
               sig.level=0.05, 
               tol=.Machine$double.eps^.8)

the result is this:

Two-sample comparison of proportions power calculation

          n = 6289195
         p1 = 0.004
         p2 = 0.004100341
  sig.level = 0.05
      power = 0.8
alternative = two.sided

NOTE: n is number in each group

So is this saying that the minimum lift is 0.000100 (0.004100341-0.004) which is equivalent to 0.01%?

It seems very low which is why I am asking.

nak5120
  • 4,089
  • 4
  • 35
  • 94
  • Thanks, so you wouldn't take the difference? I'm looking at this as a conversion rate. So group one had a conversion rate of 0.4% and then the test group is 0.41% making the lift 0.01%. – nak5120 Aug 27 '18 at 21:44
  • 1
    That seems to make sense. Your sample is huge, so that means you should be able to detect a very small increase in rate. Looking at [this example](http://rpubs.com/dcastroj/306681) seems to confirm. – Mako212 Aug 27 '18 at 21:44
  • 1
    Why do you think it seems weird? When having such a large number of observations per group (6.2m each) you'll be able to capture as statistically significant even a small difference like this. – AntoniosK Aug 27 '18 at 21:44
  • 1
    Just look at `power.prop.test(n = 100000, p1 = .5, power = .90)` for different values of `n`. For `n=10`,`p2=1.048`, for `n=100,000`, `p2 = .507` and so on. – Mako212 Aug 27 '18 at 21:48
  • 2
    Quick recommendation: If you are doing, or planning to do, lots of AB tests have a look at the `pwr` package. At some point you might want to try different splits between test and control (not the typical 50-50 split), so function `pwr.2p2n.test` would be really useful. – AntoniosK Aug 27 '18 at 21:51
  • @Mako212 `p2=1.048` for `n=10` is sooo wrong! I was happy to see that there's a warning as well :D Didn't expect the function to return something higher than 100%... – AntoniosK Aug 27 '18 at 21:57
  • Is this not correct then? – nak5120 Aug 27 '18 at 21:58
  • @AntoniosK oops, bad example, I forgot that I had warnings turned off for another bit of code – Mako212 Aug 27 '18 at 21:59
  • @nak5120 this is correct. We're trying to explain that it's not reasonable to expect a larger difference (i.e. larger minimum detectable lift). The more observation you have the more confident you are that a small observed difference is statistically significant. Right? That specific example that returns `p2=1.048` also returns a warning. – AntoniosK Aug 27 '18 at 22:02
  • @nak5120 it should be fine, there's a note in `?power.prop.test` that explains that not all conditions can be satisfied " `power.prop.test(n=30, p1=0.90, p2=NULL, power=0.8, strict=TRUE)` there is no proportion p2 between p1 = 0.9 and 1, as you'd need a sample size of at least n = 74 to yield the desired power for (p1,p2) = (0.9, 1)." – Mako212 Aug 27 '18 at 22:02
  • Ok thanks! So can you confirm the minimum lift in this case would be 0.01% for it to be statistically significant? – nak5120 Aug 27 '18 at 22:03
  • @Mako212 can you confirm above? Thanks! – nak5120 Aug 28 '18 at 13:59
  • @AntoniosK can you confirm above? Thanks! – nak5120 Aug 28 '18 at 14:03
  • 1
    Yes, that's correct. Any uplift like that or bigger, will be classified as statistically significant. So this is the minimum detectable uplift/difference. – AntoniosK Aug 28 '18 at 14:05

0 Answers0