A/B testing with very low (<0.1%) baseline rate metric (i.e. click through rate)

Question

I am trying to implement an A/B testing (online validation) for ML model that has a highly imbalanced positive event rate. For example, the model detects spam and only 1 out of 1000 samples is spam, or baseline click through rate is very low <0.1%

I know one issues is that I will need very large samples in each control and treatment cohort. Are there other issues that I need to be aware of? Will the statistical properties breakdown? What are the ways to counter them?

Thanks.

score 0 · Answer 1 · answered Aug 08 '21 at 22:55

0

You can use a calculator like the one here to get a sense for volumes needed. How much of a difference are you expecting? Eg. Detecting a 1% improvement that’s statistically significant requires way more samples than if you’re looking to detect a 30% improvement.

https://www.statsig.com/calculator

answered Aug 08 '21 at 22:55

Vineeth

21
2

While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. – Tyler2P Aug 16 '21 at 15:22

A/B testing with very low (<0.1%) baseline rate metric (i.e. click through rate)

1 Answers1