1

I run the following program and a typical console output is as follows.

Mean percentage points for weighting 0 is: 57.935590153643616
Mean percentage points for weighting 1 is: 42.06440984635654

Why are these printed means not much closer to 60 and 40?

public static void main(String[] args) {
    Random rand = new Random();

    int numCycles = 5000;

    double[] weightings = {60.0, 40.0};
    double[] weightedRandoms = new double[weightings.length];
    double[] totPercentagePoints = {0.0, 0.0};

    for (int j = 0; j < numCycles; j++) {

        for (int k = 0; k < weightings.length; k++) {
            weightedRandoms[k] = (rand.nextInt(10) + 1) * weightings[k]; // +1 to the random integer to ensure that the weighting is not multiplied by 0
        }

        for (int k = 0; k < weightings.length; k++) {
            totPercentagePoints[k] += weightedRandoms[k] / DoubleStream.of(weightedRandoms).sum() * 100;
        }
    }

    for (int i = 0; i < weightings.length; i++) {
        System.out.println("Mean percentage points for weighting " + i + " is: " + totPercentagePoints[i] / numCycles);
    }
}
danger mouse
  • 1,457
  • 1
  • 18
  • 31
  • Why would you expect them to be ? What is the purpose of the code ? – jr593 Sep 25 '17 at 13:54
  • Hi, I was hoping not to have to reveal the purpose. As for why I was expecting the means to be closer, it's this: if 5000 random number pairs are generated, with each number in the pair being between 1 and 10 (inclusive) e.g. 3-7, 5-2, 1-10, 8-8, etc., and if these are then seen as ratios (30:70, 71:29, 9:91, 50:50, etc.), I would expect the mean ratio to be 50:50. If, then, each side of the ratio is multiplied by a weighting (in this case 60 and 40 respectively), I would expect the mean ratio to be 60:40. Does this logic make sense? Something's clearly amiss. – danger mouse Sep 25 '17 at 14:12
  • "If, then, each side of the ratio is multiplied by a weighting (in this case 60 and 40 respectively), I would expect the mean ratio to be 60:40." Apologies, I should have written "If each side of the number pair is multiplied by a weighting (in this case 60 and 40 respectively), I would expect the resulting ratios to be 50:50 multiplied by 60 and 40, giving 3000:2000, which reduces to 60:40." – danger mouse Sep 25 '17 at 14:27
  • I think I understand about probability distributions - I have just re-run the program with numCycles = 1000000 and I get an output of 58.4 and 41.6. – danger mouse Sep 25 '17 at 14:45
  • 1
    The expected value of something of the form `X/(X+Y)` is not the ratio of the expected values. Expectation is a linear operator, but forming ratios is nonlinear. I am getting the same sort of observed ratios in R that you are getting in Java. There is no problem with your code, but your intuitions as to what you expect to see are off. – John Coleman Sep 25 '17 at 15:30
  • Thanks @John Coleman. It's interesting, then, that the result is not more different to 60:40 than it is. If anyone's got a formula for calculating the expected results, based on two (or more) weightings, I'd be keen to know it. – danger mouse Sep 25 '17 at 15:45
  • 1
    Maybe post it on [mathematics.se] `X` is of the form `0.6*U(1,10)` and `Y` is of the form `0.4*U(1,10)`. In calculating `E[X/(X+Y)]` a complicating factor is that the numerator and denominator are correlated. I don't know of any nice formula, though there are ways to approximate it (e.g. http://www.stat.cmu.edu/~hseltman/files/ratio.pdf ) – John Coleman Sep 25 '17 at 15:48
  • Thanks. I won't take this further for now - if others want to feel free. – danger mouse Sep 25 '17 at 16:03

1 Answers1

2

You are estimating 100*E(X/(X+Y)] and 100*E(Y/(X+Y)] where X = 60*U(1,10) and Y = 40*U(1,10) (where U(1,10) is the discrete uniform distribution on 1,..,10). Since there are only 10*10 = 100 possible ways to generate two such uniform variables, you can compute the expressions for each such pair and then compute these expectations directly. In Python define:

def f(x,y): return 60*x/(60*x + 40*y)

and then:

>>> sum(f(x,y) for x in range(1,11) for y in range(1,11))
58.36068355253924

Note that the 100 you multiply by cancels out exactly the factor of 1/100 that you would need in computing the expectation.

Similarly if you define:

def g(x,y): return 40*y/(60*x + 40*y)

Then:

>>> sum(g(x,y) for x in range(1,11) for y in range(1,11))
41.639316447460756

These do mesh with what you are observing.

John Coleman
  • 51,337
  • 7
  • 54
  • 119