I have plenty of experience designing, running and evaluating two-way split tests (A/B Tests). Those are by far the most common in digital marketing, where I do most of my work.
However, I'm wondering if anything about the methodology needs to change when more variants are introduced into an experiment (creating, say, a 3-way test (A/B/C Test)).
My instinct tells me I should just run n-1 evaluations against the control group.
If I run a 3-way split test, for example, instinct says I should find significance and power twice:
- Treatment A vs Control
- Treatment B vs Control
So, in that case, I'm finding out which, if any, treatment performed better than the control (1-tailed test, alt: treatment - control > 0, the basic marketing hypothesis).
But, I'm doubting my instinct. It's occurred to me that running a third test contrasting Treatment A with Treatment B could yield confusing results.
For example, what if there's not enough evidence to reject a null that treatment B = treatment A?
That would lead to a goofy conclusion like this:
Treatment A = Control
Treatment B > Control
Treatment B = Treatment A
If treatments A and B are likely only different due to random chance, how could only one of them outperform the control?
And that's making me wonder if there's a more statistically sound way to evaluate split tests with more than one treatment variable. Is there?