0

I have a list of 7337 customers (selected because they only had one booking from March-August 2018). We are going to contact them and are trying to test the impact of these activities on their sales. The idea is that contacting them will cause them to book more and increase the sales of this largely inactive group.

I have to setup an A/B test and am currently stuck on the sample size calculation.

Here's my sample data: Data

The first column is their IDs and the second column is the total sales for this group for 2 weeks in January (i took 2 weeks as the customers in this group purchase very infrequently).

The metric I settled on was Revenue per customer (RPC = total revenue/total customer) so I can take into account both the number of orders and the average order value of the group.

The RPC for this group is $149,482.7/7337=$20.4

I'd like to be able to detect at least a 5% increase in this metric at 80% power and 5% significance level. First I calculated the effect size.

Standard Deviation of the data set = 153.9 Effect Size = (1.05*20.4-20.4)/153.9 = 0.0066

I then used the pwr package in R to calculate the sample size.

pwr.t.test(d=0.0066, sig.level=.05, power = .80, type = 'two.sample')

 Two-sample t test power calculation 

          n = 360371.048
          d = 0.0066
  sig.level = 0.05
      power = 0.8
alternative = two.sided

The sample size I am getting however is 360,371. This is larger than the size of my population (7337).

Does this mean I can not run my test at sufficient power? The only way I can determine to lower the sample size without compromising on significance or power is to increase the effect size to determine a minimum increase of 50% which would give me an n=3582.

That sounds like a pretty high impact and I'm not sure that high of an impact is reasonable to expect.

Does this mean I can't run an A/B test here to measure impact?

datababie
  • 1
  • 3
  • A few things that might help - 1) one-tailed test. if you only expect an increase, and only want to discover that, you can be less strict. 2) the std sounds very high compared to the mean. what is going on there? could you have gotten something wrong there? – ShaharA Feb 24 '19 at 15:01
  • 1) Thanks, that's a good point about the one-tailed test. 2) I attached the dataset to this question. I am using =STDEV.S(B2:B7338) in Excel on Column B to get the standard deviation. How else would you do it? FYI this is a group that rarely has any sales - does that explain why the STD is so high? – datababie Mar 04 '19 at 05:25

0 Answers0