Sampling A Percentage Of Web Traffic

Question

I have a question which I'm pretty sure I think I know the answer to, but I'm hoping to get some validation. Someone recently asked me how they could take a random sample of a certain percentage of web traffic. What they would like to do is treat 5% of traffic differently by presenting a different experience. What I proposed at a basic level is something like the code below.

double rand = Math.random()*100;
if(rand < 5){
    //treat differently
}

So to make sure I wasn't making some ridiculous assumption, I decided to test it out, using Thread.sleep() to simulate sporadic uneven inflow of requests, by doing the following

long runtime = 120000;
double requests = 0;
double hits = 0;
double rand = Math.random()*100;
long starttime = System.currentTimeMillis();

while(starttime + runtime > System.currentTimeMillis()){
    requests++;
    if(rand < 5){
        hits++;
    }
    rand = Math.random()*100;
    try{
        Thread.sleep((long)rand * 100);
    }catch(InterruptedException e){
    }
}

System.out.println(hits);
System.out.println(requests);
System.out.println(hits/requests);

Without the Thread.sleep, regardless of runtime I get results similar to the following

2902723.0
5.8084512E7
0.04997413079755237

With the Thread.sleep though, the percentage of hit rates varies quite a bit. My assumption is that what I'm experiencing is something like a mathematical limit, where the reason why running without Thread.sleep is because it's, for practical purposes, reaching an "infinite" amount of requests. And I'm also assuming that if we ran this in production long term our hit rate would eventually reach 5% as well. Am I off base, or is my thinking valid? Thanks in advance.

This isn't clear; what are you trying to discover with the above code? If it's just a test that `rand < 5` obtains 5%, then what is the purpose of the random sleeping? — Oliver Charlesworth, Jun 14 '13 at 17:03
It should certainly be around 5%. What results are you getting with `Thread.sleep()`? — Reinstate Monica -- notmaynard, Jun 14 '13 at 17:06
Also note that you're sleeping for between 1 and 100 milliseconds, so running for two minutes will get between 1200 and 120000 requests, far fewer than the 58 million. And after re-reading your question, yes, you're right in that it's basically a limit — sort of like "as requests approaches infinity, percent approaches 5". — Reinstate Monica -- notmaynard, Jun 14 '13 at 17:09
@iamnotmaynard, with sleep I've recently had it vary between 8 and 2 percent. Thanks for the quick validation in your most recent comment. — butallmj, Jun 14 '13 at 17:29

Sampling A Percentage Of Web Traffic

0 Answers0