I have a question which I'm pretty sure I think I know the answer to, but I'm hoping to get some validation. Someone recently asked me how they could take a random sample of a certain percentage of web traffic. What they would like to do is treat 5% of traffic differently by presenting a different experience. What I proposed at a basic level is something like the code below.
double rand = Math.random()*100;
if(rand < 5){
//treat differently
}
So to make sure I wasn't making some ridiculous assumption, I decided to test it out, using Thread.sleep() to simulate sporadic uneven inflow of requests, by doing the following
long runtime = 120000;
double requests = 0;
double hits = 0;
double rand = Math.random()*100;
long starttime = System.currentTimeMillis();
while(starttime + runtime > System.currentTimeMillis()){
requests++;
if(rand < 5){
hits++;
}
rand = Math.random()*100;
try{
Thread.sleep((long)rand * 100);
}catch(InterruptedException e){
}
}
System.out.println(hits);
System.out.println(requests);
System.out.println(hits/requests);
Without the Thread.sleep, regardless of runtime I get results similar to the following
2902723.0
5.8084512E7
0.04997413079755237
With the Thread.sleep though, the percentage of hit rates varies quite a bit. My assumption is that what I'm experiencing is something like a mathematical limit, where the reason why running without Thread.sleep is because it's, for practical purposes, reaching an "infinite" amount of requests. And I'm also assuming that if we ran this in production long term our hit rate would eventually reach 5% as well. Am I off base, or is my thinking valid? Thanks in advance.