4

I did two basic A-B-C tests on my website with something like

if(mt_rand(0,2) == 0){
//THROW IN RE HERE 
}elseif(mt_rand(0,2) == 1){
//THROW IN LR HERE
}else{
//THROW IN LB HERE
}

I was expecting the three conditions to occur equally often (33.3% of all pageviews). However, the impressions (as measured by Google Adsense) show very different distributions. Interestingly, both tests (two charts below) show similar patterns: LB occurs most, then RE and then LR.

The sample sizes are many thousands so the chance that this occurs by random chance is really zero.

Am I misunderstanding mr_rand()? Does anybody know if it's been properly tested? How else could these weird patterns show up?

enter image description here

RubenGeert
  • 2,902
  • 6
  • 32
  • 50
  • How many servers do you use to host your website? – Andrej Aug 29 '16 at 18:52
  • I think 1. I don't really have a clue. Could that make any difference? – RubenGeert Aug 29 '16 at 18:54
  • The number of servers used _could_ definitely make a difference if the load balancing method is not uniform. You'd be getting an uneven variance from potentially one server. Also, if the Google Analytics tracking has a biased (for example, not reporting results in certain cases). – Sherif Aug 29 '16 at 18:57

2 Answers2

4

You're running mt_rand test twice.. you have option 0, 1 and 2. if the test is 0, you throw RE. if not, (ie it's 1 or 2), you run the same test again, (again with options 0, 1 and 2). There you test for 1 and if it is, you throw LR. if not (it's 0 or 2) you throw LB. I can explain it further if you need..

    $number = mt_rand(0,2);
    switch ($number){
     case 0:
       //do re
       break;
     case 1:
       //do lr
       break;
     case 2:
       //do lb
       break;
    }

Or this might do the job as well

if(mt_rand(0,2) == 0){
//THROW IN RE HERE 
}elseif(mt_rand(0,1) == 1){ //we've stripped RE out, no longer deciding from 3 options
//THROW IN LR HERE
}else{
//THROW IN LB HERE
}
Honza
  • 683
  • 7
  • 22
  • 1
    ...of course... The problem is that I'm often drawing a random number more than once, which wasn't my intention. You first suggestion is exactly what I was aiming for. I also explains why my previous A-B test did work properly. – RubenGeert Aug 29 '16 at 19:31
3

I'm not sure how you're collecting the data through Google Adsense, exactly. Are you relying on Google Analytics by passing in some custom var? If so there definitely could be other factors causing the biased that have nothing to do with PHP.

To test uniform random distribution we can run a test like this in PHP.

$test = [0,0,0];
for($i = 0; $i < 100000; $i++) {
    $rand = mt_rand(0,2);
    $test[$rand]++;
}
var_dump($test);

Which should give you results like this...

array(3) {
  [0]=>
  int(33288)
  [1]=>
  int(33394)
  [2]=>
  int(33318)
}

This indicates the 33% uniform distribution you're looking for over 100K iterations.

It's important to note that the implementation of mt_rand() is a PRNG (Pseudo Random Number Generator) and not a CSPRNG (Cryptographically Secure Pseudo Random Number Generator). Meaning, it's not well suited for cryptographic purposes, but works fine for other PRNG needs. It's based on Mersenne Twister because it's faster than libc rand(). Though I don't think any issues you're finding here in your data is likely to be a direct result of PHP's implementation of mt_rand().

Sherif
  • 11,786
  • 3
  • 32
  • 57