-1

I am trying to implement an A/B testing (online validation) for ML model that has a highly imbalanced positive event rate. For example, the model detects spam and only 1 out of 1000 samples is spam, or baseline click through rate is very low <0.1%

I know one issues is that I will need very large samples in each control and treatment cohort. Are there other issues that I need to be aware of? Will the statistical properties breakdown? What are the ways to counter them?

Thanks.

1 Answers1

0

You can use a calculator like the one here to get a sense for volumes needed. How much of a difference are you expecting? Eg. Detecting a 1% improvement that’s statistically significant requires way more samples than if you’re looking to detect a 30% improvement.

https://www.statsig.com/calculator

Vineeth
  • 21
  • 2
  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. – Tyler2P Aug 16 '21 at 15:22