Possible uneven distribution in Java's Random class or bad implementation?

Question

I'm tinkering around with a cellular automaton and my movement detection function is acting really strangely. I'm 80% sure it's my implementation but I have no idea where the issue is. Could someone take a look and enlighten me since I've spent the better part of 7H trying to get it to work and it won't:

private int[] cellularSearch(short xPos, short yPos)
{
 // the center position is our current position; the others are potentially free positions
 byte[][] positions = new byte[][]{{0,0,0},{0,1,0},{0,0,0}};          
 int[] result = new int[2];
 byte strike=0;
 int dice0=0, dice1=0;


  while(strike<9)
  {
    dice0 =  r.nextInt(3)-1;
    result[0] = xPos + dice0;

    if((result[0] >= 0)
    && (result[0] < img.getWidth()))
    {
        dice1 = r.nextInt(3)-1;
        result[1] = yPos + dice1;

        if((result[1] >= 0)
        && (result[1] < img.getHeight()))
        {
           if((positions[dice1+1][dice0+1] != 1))       // if this isn't our own cell and wasn't tried before
           {
                if(img.getRGB(result[0], result[1]) == Color.white.getRGB())     // if the new cell is free
                {
                    return result;
                }
           }

           positions[dice1+1][dice0+1]=1;       // we need to use +1 to create a correlation between the linkage in the matrix and the actual positions around our cell
           strike++;
        }
    }
  }
}

The code works and it correctly identifies when a pixel is white and returns the position for it. My problem is the distribution of the results. Given that I'm using Random both for the row and the column, I was expecting a near equal distribution over all possible locations, but what happens is that this code seems to prefer the cell right above the coordinates being fed in (it hits it ~3x as much as the other ones) and the one right below the coordinates (it hits it ~2x as much as the others).

When I start my program and all my pixels slowly move towards the top of the window on EVERY run (vs true randomness with my old lengthy code which was 3x as long), so there's gotta be an error in there somewhere. Could someone please lend a hand?

Thank you in advance!

EDIT: Thank you everyone for the effort! Sorry for the non-compiling code but I extracted the main purpose of the function while cutting out a ton of commented code (my other approaches to implementing this function). Locally the code has the return statement and it runs. I'll slowly go through all your answers in the next few hours (gonna have dinner soon).

EDIT2: I tried what @DodgyCodeException and @tevemadar suggested and made a list with all the 8 positions, then shuffle them each time I enter the function, and then iterate through them, trying each one in part. Still the position exactly above and below the current cell are selected most. I'm baffled. This is my old super-spaghetti code that I've written for this function and it worked perfectly with no errors, equal distribution, and (oddly enough) it's the most efficient implementation that I've tried out of everything mentioned here. After I'm done with lunch and filing some paperwork I'll thoroughly study it (it's been ~ 2 years since I wrote it) to see why it works so well. If anyone still has ideas, I'm fully open.

 boolean allRan=false;
 int lastDice=0, anteLastDice=0, dice = r.nextInt(3)+1;

 //the initial dice usage is for selecting the row on which we'll operate:
 //dice = 1 or 3 -> we operate above or under our current cell; dice = 2 -> we operate on the same row
while(!allRan)
{
 if((dice==1) || (dice==3))
    {
    int i= r.nextInt(3);

    if(((xPos-1+i) < img.getWidth())
    && ((xPos-1+i) >= 0))
       {
       if(((yPos-1) >= 0)
       && (img.getRGB(xPos-1+i, yPos-1) == Color.white.getRGB())
       && (dice==1))
          {
          result[0] = xPos-1+i;
          result[1] = yPos-1;
          above++;

          endTime = (int) System.currentTimeMillis();
          section4Runtime += (double) (endTime - startTime) / 1000;
          return result;
          }
       else if(((yPos+1) < img.getHeight())
       && (img.getRGB(xPos-1+i, yPos+1) == Color.white.getRGB())
       && (dice==3))
          {
          result[0] = xPos-1+i;
          result[1] = yPos+1;
          below++;

          endTime = (int) System.currentTimeMillis();
          section4Runtime += (double) (endTime - startTime) / 1000;
          return result;
          }
       }

    // if this section is reached, it means that: the initial dice roll didn't find a free cell, or the position was out of bounds, or the dice rolled 2
    // in this section we do a dice reroll (while remembering and avoiding our previous values) so that we cover all dice rolls
    if(dice==1)
        {
         if(lastDice==0)
            {
            lastDice=dice;
            dice += r.nextInt(2)+1;     // we incrmeent randomly towards 2 or 3.
            }
         else
            {
             if(lastDice==2)
                 {
                  if(anteLastDice==0)
                    {
                     anteLastDice= lastDice;
                     lastDice=dice;
                     dice=3;
                    }
                  else
                    {
                     allRan=true;
                    }
                 }
             else if(lastDice==3)
                 {
                  if(anteLastDice==0)
                    {
                     anteLastDice= lastDice;
                     lastDice=dice;
                     dice=2;
                    }
                  else
                    {
                     allRan=true;
                    }
                 }
            }
        }
        else        // dice is 3
        {
            if(lastDice==0)
            {
             lastDice=dice;
             dice -= r.nextInt(2)+1;     // we decrement randomly towards 2 or 1.
            }
         else
            {
             if(lastDice==2)
                 {
                  if(anteLastDice==0)
                    {
                     anteLastDice= lastDice;
                     lastDice=dice; 
                     dice=1;
                    }
                  else
                    {
                     allRan=true;
                    }
                 }
             else if(lastDice==1)
                 {
                  if(anteLastDice==0)
                    {
                     anteLastDice= lastDice;
                     lastDice=dice;
                     dice=2;
                    }
                  else
                    {
                     allRan=true;
                    }
                 }
            }
        }
    }

 if(dice==2)
    {
    int i=0;
    i += r.nextInt(2)==0?-1:1;

    if(((xPos+i) < img.getWidth())
    && ((xPos+i) >= 0)
    && (img.getRGB(xPos+i, yPos) == Color.white.getRGB()))
       {
       result[0] = xPos+i;
       result[1] = yPos;
       leveled++;

       endTime = (int) System.currentTimeMillis();
       section4Runtime += (double) (endTime - startTime) / 1000;
       return result;
       }

    // same as above: a dice reroll (with constrictions)
    if(lastDice==0)
        {
        lastDice=dice;
        dice+= r.nextInt(2)==0?-1:1;        // randomly chose if you decrement by 1 or increment by 1
        }
    else
        {
         if(lastDice==1)
            {
             if(anteLastDice==0)
                {
                 anteLastDice= lastDice;
                 lastDice=dice;
                 dice =3;   
                }
             else
                {
                 allRan=true; 
                }
            }
         else if(lastDice==3)
            {
             if(anteLastDice==0)
                {
                 anteLastDice= lastDice;
                 lastDice=dice;
                 dice =1;
                }   
             else
                {
                 allRan=true;
                }
            }

        }
    }
}

return result;

After much thought, I eventually figured it out. All the ideas that we all had were violating a fundamental "rule" of the first implementation that I was using: the first implementation was trying a random position on one of the 3 lines, then moving on to the next lines (without coming back to try the other positions on that line). Example: if the algo selected the line above, it would randomly try the top-left corner to see if it's free; if it wasn't then it would try the same line as the current cell and the line below (again, just with 1 of their possible positions) without coming back. All our ideas were iterating through all possibilities around the cell, which meant that it was inevitable to have the top and bottom line have more hits than the middle (since top and bottom have 3 possible points each while middle has only 2). Also, when there were holes in the field, the cells most likely to fill it up were the ones that were moving diagonally (which in the end is up or down) or those directly moving up or down, since those moving sideways only had the options left/ right. The only mystery that will remain unsolved is why (using our proposed implementations) the model would generally use the point exactly above our current cell. I have no idea why it loves going straight up most of the time with that implementation. Nevertheless, the new algorithm (which reflects the old one, but is much lighter) is:

boolean[] lines = new boolean[]{false, false, false};
byte checks =0;

while(checks < 3)       // just 3 tries in total
{
    dice = r.nextInt(3);

    if(lines[dice]== false)
    {
        lines[dice] = true;             // just 1 try per line
        // calculated here since we reuse dice below
        result[1] = yPos - 1 + dice;    // will be above if dice==0; will be below if dice==2; same line if dice==1

        if((dice == 0) || (dice == 2))        // top/bottom line
            {dice = r.nextInt(3)-1;}
        else if(dice == 1)        // middle line
            {dice = r.nextInt(2)==0?-1:1;}        // we exclude the middle point since that's our current position

        result[0] = xPos + dice;    // logic is calculated above and just applied here
        checks++;
    }

    if((result[0] >= 0)
    && (result[0] < img.getWidth())
    && (result[1] >= 0)
    && (result[1] < img.getHeight()))
    {
        if (img.getRGB(result[0], result[1]) == Color.white.getRGB())     // if the new cell is free
        {
            return result;
        }
    }
}
result[0] = -1;     // in case we get here, reset the value so it's not used

This brings the code down from 167 lines to 33 lines (and makes it MUCH more readable). I have no idea who to select as the best solution. Please suggest if you have any ideas.

Do you use java.util.Random? What seed do you initialize `r` with? — otisonoza, Nov 28 '17 at 14:45
While I'm not that familiar with the internals of Java's default Random class, it does provide alternative random number generators through the SecureRandom class. They're a bit heavier in use, but maybe they can help? — Jeroen Steenbeeke, Nov 28 '17 at 14:46
Try moving the `dice1 = r.nextInt(3)-1;` outside the if block just below `dice0 = r.nextInt(3)-1;` — 11thdimension, Nov 28 '17 at 15:08
if you start in the center, xPos is probably img.getWidth()/2 ... but xPos always adds, never subtracts, so it can only go in one of 4 corners... or am I missing something? — dube, Nov 28 '17 at 15:33
Your code doesn't compile. `missing return statement` -- since it's possible to exit the `while` loop without hitting the `return` inside. — slim, Nov 28 '17 at 15:36
How many times did you run the code to determine it was unevenly distributed? — dube, Nov 28 '17 at 15:38
btw it looks as if you're using an `Image` (not sure what package) as your holder of cellular automaton state. I'd suggest decoupling the model from the rendering; have your own state object (it could be a 2D array) and render it to an image when required. — slim, Nov 28 '17 at 16:48
@otisonoza: am using that library; not initializing with anything. The code on my machine is as you see above (forgot just the return statement). — tudor balus, Nov 28 '17 at 17:52
@11thdimension: I'm using this function as I'm iterating through an array of ~ 300 000 objects that represent cellular automatons. After I'm done processing the array I immediately put it on an image and then display it. I'm looking at shaving off 50-100 ms wherever I can -> no point in an extra dice if the first is out of bounds :). — tudor balus, Nov 28 '17 at 17:54
@dube: Well since the image I'm mapping the automata to gets updated as fast as the JVM can run the commands, I have roughly a few hundred thousand runs (a few ~1 minute runs which print out 100s-1000s of stats). I could drop you the stats if you're really curious. — tudor balus, Nov 28 '17 at 17:56
@slim: check out what I said to 11thdimension. I'm already doing pretty much what you said (unless I don't get what you're saying xD). — tudor balus, Nov 28 '17 at 17:57
The reason why I'm suggesting that, is because your conditional logic may be interfering with the randomness. There's nothing wrong with Random class, so only thing to examine is your own code. You can't add invoke `Rnadom.nextInt()` in multiple conditional branches and expect the combined result to remain unbiased. — 11thdimension, Nov 28 '17 at 18:18
I fail to see why having it inside the condition is an error. I'm actually aiming at having two random numbers generated, with the second one being generated if the first one is good. Can you elaborate? — tudor balus, Nov 28 '17 at 20:04
Where do you initialize the variable `r`? Is it just once at the very start of your application, or inside a method call where you initialize it every time you call the method? And how (which one of the overloaded constructors of class `Random` do you call)? — DodgyCodeException, Nov 28 '17 at 21:37
@11thdimension it's hard to see because of all the nested ifs and the over-broad scope of the `dice` variables -- but if you refactor it down, you see that it never uses last iteration's roll for this iteration. — slim, Nov 29 '17 at 10:10
@DodgyCodeException: every time I enter this function. I'm using the normal: Random r = new Random(); — tudor balus, Nov 29 '17 at 11:05
Can you try putting `Random r = new Random();` at the start of your `main` method, and passing r as an argument, and not calling `Random()` every time? — DodgyCodeException, Nov 29 '17 at 12:18
@DodgyCodeException: my code spans mainly in two files and has roughly 1000 lines of code. Making r be passed around seems like a big overhead to me. Also, please check my comment to your answer. — tudor balus, Nov 29 '17 at 15:03
You regard passing r around as a big overhead? Do you know what the overhead of calling the Random() constructor is? If you don't want to pass arguments around, you could use a `public static final RNG = new Random()` in your main class and just use `Main.RNG` everywhere. Or just use [ThreadLocalRandom.current()](https://docs.oracle.com/javase/9/docs/api/java/util/concurrent/ThreadLocalRandom.html#current--). — DodgyCodeException, Nov 29 '17 at 15:41
Good point! I concede. I'll make r a 'public static final' variable to not instantiate it so many times. That aside, what method to use to best solve my dilemma? Should I get my old method and work around it? (I put it in the main post above) — tudor balus, Nov 30 '17 at 08:00

DodgyCodeException · Answer 1 · 2017-11-29T12:30:37.430

2

The distribution of java.util.Random is not that uneven. You can confirm with the following code:

public static void main(String[] args) throws Exception {
    final int N = 3;
    Random r = new Random();
    int[] counts = new int[N];
    for (int i = 0; i <= 100_000; i++) {
        counts[r.nextInt(N)]++;
    }
    System.out.println(Arrays.toString(counts));
}

UPDATE:

As you've said, the above code produces fairly evenly distributed values. However, add the following line at the beginning of the loop:

        if (i % 6 == 0)
            r = new Random(0);

And then you get [16667, 33334, 50000]. One value occurs twice as frequently, and another 3 times as frequently, as the first. This sets the random number generator to a newly created one with a constant seed. It simulates your code, in which you say you create a new Random() on entry to your function (albeit without a seed argument) and then your function calls nextInt() six times - this if (i % 6 == 0) statement ensures a new RNG is also created every 6 iterations.

Check your code and make sure you are only ever creating a RNG once in your whole program.

edited Nov 29 '17 at 12:30

answered Nov 28 '17 at 14:57

DodgyCodeException

5,963
3
21
42

I ran it several times and the average result is ~: [33264, 33396, 33341] which is pretty reasonable. The average results that I'm getting with the distribution of my points is: 1: 4 | 2: 36 | 3: 1 | 4: 7 | 5: 2 | 6: 13 | 7: 1 | 8: 3 You have 8 possible points around a square and I mapped them from left-right, top-down. As you can see points 2 and 6 are BY FAR the highest ones being chosen. These values are the % average of the population that is moving on which cell they land. The %s don't add up to 100% since the 300 000 don't all have room to move (some next to eachother). – tudor balus Nov 28 '17 at 18:09
Interesting thought. In my code I instantiate r only once, but I call r.nextInt() twice though. As to the scope of the variable: I'm using random in other areas of the code, but each has its own variable and declaration/ instantiation (just once of course). – tudor balus Nov 29 '17 at 15:05
Does the *UPDATE* mean that if you use `new Random()` every time you call it, it won't be distributed equally? – basZero Sep 13 '22 at 09:30

slim · Answer 2 · 2017-11-28T16:39:33.947

First, I have to admit I can't see what your algorithm is supposed to be doing -- it's not clear to me why you roll the each die when you do, other times using the existing value.

For a clear, easy to follow algorithm, I'd suggest scoping your dice variables inside the loop, rolling both at the same time, and making them final so that you know that each iteration has exactly one two-die roll:

while(strike < 9) {
    final int roll1 = r.nextInt(3) - 1;
    final int roll2 = r.nextInt(3) - 1;

    strike += handleRoll(roll1,roll2);
}

You can prove the distribution to yourself by writing a simple counter for your handleRoll(), before later substituting your real code.

int[] counts = int[6];
void handleRoll(int roll1, int roll2) {
     counts[1 + roll1] ++;
     counts[4 + roll2] ++;
     return 1;
}

(Increase the required strike count to get large enough samples to reason about)

Make sure you use the same instance of Random throughout the program -- don't keep making new ones.

(You could tidy this up a bit by creating a Coordinate class and a factory that creates random ones)

I simplified your code like this:

made a series of extract-method refactorings to tidy away detail
changed your rolls to use the range 0 to 2 instead of -1 to +1 -- since you use them in two places, and in one of those you add one again!
used x and y and only create result when needed
used final for the rolls and the resulting x and y, scoping them to the inside of the loop
turned nested ifs into an && logic
changed some odd type choices. The positions grid seems made for boolean. There's seldom any value in using short in Java.

So:

private int[] cellularSearch(int xPos, int yPos) {
     boolean[][] positions =
            new boolean[][] { { false, false, false }, 
                              { false, true, false }, 
                              { false, false, false } };
    int strike = 0;

    while (strike < 9) {
        final int dice0 = r.nextInt(3);
        final int dice1 = r.nextInt(3);

        final int x = xPos + dice0 - 1;
        final int y = yPos + dice1 - 1;

        if (isInXrange(x) && isInYRange(y)) {
            if (!alreadyTried(positions, dice1, dice0) && isWhite(x, y)) {
                return new int[] { x, y };
            }

            markAsTried(positions, dice1, dice0);
            strike++;
        }
    }
    return null; // or whatever you intend to happen here
}

private boolean isInXrange(int x) {
    return (x >= 0) && (x < img.getWidth());
}

private boolean isInYRange(int y) {
    return (y >= 0) && (y < img.getHeight());
}

private boolean alreadyTried(boolean[][] positions, final int dice1, final int dice0) {
    return positions[dice1 + 1][dice0 + 1];
}

private static void markAsTried(boolean[][] positions, int dice1, int dice0) {
    positions[dice1][dice0] = true;
}

private boolean isWhite(final int x, final int y) {
    return img.getRGB(x, y) == Color.white.getRGB();
}

I think this is equivalent to your code, with one exception -- yours doesn't roll the second die if the first roll takes you outside the width of the image. You could re-add this as a performance improvement later if you like.

But it exposes some issues. It looks as if the intent is to try every cell (you have a 3x3 grid, and you've chosen 9 "strikes") - but it doesn't increment strike when x,y is outside the image. It does increment strike when the position has been tried before. So you can exit the loop having not tried every cell.

I don't see a specific way that this causes the weighting you've described -- but it looks like the sort of thing that could lead to unexpected results.

(Anyway - since the code you've given doesn't compile, you didn't observe it with the code you've given us)

If the intention is to check every cell, it might be better to shuffle a list of cells to try, then test them in order:

 List<Coords> coordsToTry = new ArrayList<>();
 for(int x=0; x<2; x++) {
     for(int y=0; y<2; y++) {
         coordsToTry.add(new Coords( x, y));
     }
 }
 Collections.shuffle(coordsToTry);

 for(Coords coords : coordsToTry) {
    if(isWhite(coords)) {
        return coords;
    }
 }
 return null; // or whatever is meant to happen when nothing found

while your code is way cleaner than mine, I fear that the extra function calls will slow down the runtime. I'm handling a large dataset (300 000 automatons) and this function is fed in the coordinates of each and every one of them as to compute if they can move. After all the calculations have been done for the whole list, I paint it on a BufferedImage and then show it. I'm trying to make everything as fast as possible (why the second dice is rolled only if the first is good). Also, the rule is that the cell can only try move in its immediate vicinity (the strike incrementing thing). — tudor balus, Nov 28 '17 at 18:07
Extract-method doesn't slow down Java - if anything it speeds it up by giving the JVM more scope to optimise. In general, don't optimise for speed until you're satisfied with correctness. Then only optimise what profiling proves to be wasting time. — slim, Nov 28 '17 at 22:38
My point about the strikes logic is that it looks as if you want to give up after testing each of the nine cells. But if it picks 0,0 nine times in a row, it will give up without testing any of the other cells. — slim, Nov 28 '17 at 22:47
https://stackoverflow.com/questions/25265303/can-method-extraction-negatively-impact-code-performance — slim, Nov 28 '17 at 23:01

score 1 · Answer 3 · answered Nov 28 '17 at 15:02

1

java.util.Random is a pseudorandom number generator (definition on wikipedia) and needs to be seeded. From the docs:

If two instances of Random are created with the same seed, and the same sequence of method calls is made for each, they will generate and return identical sequences of numbers. In order to guarantee this property, particular algorithms are specified for the class Random.

If you want to be sure to get good random numbers, use SecureRandom, which is guaranteed to produce non-deterministic output

answered Nov 28 '17 at 15:02

dube

4,898
2
23
41

1

The zero-arg constructor "sets the seed of the random number generator to a value very likely to be distinct from any other invocation of this constructor." -- if OP was seeding it with a fixed value, they'd know it. – slim Nov 28 '17 at 15:08
1

@slim Well it's more likely the OPs error than java.util.Random generating non-random numbers, so assuming what he did or didn't won't solve the issue. – dube Nov 28 '17 at 15:12
2

... and if they *did* seed with the same value every time, you'd still expect a uniform distribution over the life of the `Random` object. – slim Nov 28 '17 at 15:28
2

It might be a loop of 10 runs where he uses the same Random, which he statically seeds at the beginning of it's test. Yes it's unlikely, but again, we don't know how he tested it and OP should check it or post a complete test case – dube Nov 28 '17 at 15:47
2

In which case the answer would not be "use SecureRandom", but "use the same instance of `Random` throughout." `SecureRandom` is no better for statistical and simulation purposes than `Random`. – slim Nov 28 '17 at 16:36
1

Check my comments above. I iterate thousands of times for hundreds of thousands of positions.I doubt it's a "limitation" of the Random library though since I've been using it similarly before with no odd behaviour. – tudor balus Nov 28 '17 at 20:01

tevemadar · Answer 4 · 2017-11-28T16:08:16.143

1

Since you are interested in the combined distribution of the two 'dices', on top of @DodgyCodeException's suggestion, you can check statistics like

public static void main(String[] args) {
    Random r=new Random();
    int stat[]=new int[9];
    for(int i=0;i<9000;i++)
        stat[r.nextInt(3)+r.nextInt(3)*3]++;
    for (int i : stat)
        System.out.println(i);
}

However it is pretty even too.

There is a minor difference between generating random numbers from a power-of-two-range and otherwise, so if you really want to do some magic, you can use the fact that you are actually picking a position out of 8 possibilities (since the middle one is ruled out at the beginning). Something like

final int xhelper[]=new int[]{-1, 0, 1,-1, 1,-1, 0, 1};
final int yhelper[]=new int[]{-1,-1,-1, 0, 0, 1, 1, 1};
...
int dir=r.nextInt(8);
int dice0=xhelper[dir];
int dice1=yhelper[dir];

But in fact I do not think it makes a difference.

edited Nov 28 '17 at 16:08

answered Nov 28 '17 at 16:02

tevemadar

12,389
3
21
49

That's an interesting approach. Yes, it may be similar but it's not identical. I'll give it a try tomorrow if we don't find any other solution for my original code. Thank you! – tudor balus Nov 28 '17 at 20:02
Tried it just now. I think yours is a little easier to read, but it virtually does exactly the same thing: all the automatons tend upwards still. I'm starting to be convinced that SOMEHOW I'm misusing the random library. – tudor balus Nov 28 '17 at 21:02
1

I have a hunch that it's not the random library you're misusing, but some weird error in your algorithm. To take the randomness completely out of the equation, why not try with custom "random" values that simply cycle through a circular list of numbers, where every number occurs exactly once. If you get similar results, then you know it's not a misuse of the Random class, but something else. – DodgyCodeException Nov 28 '17 at 21:32
2

@tudorbalus One thing which can 'push' a generation-based construct (like a cell-automata) towards a single 'more probable' direction is when generations get mixed in a single storage (instead of having separate ones for old and new gen). It does not even have to be intentional, overlooking something in the buffer-swap may result in having two the two variables referring to a single buffer. – tevemadar Nov 28 '17 at 21:54

Possible uneven distribution in Java's Random class or bad implementation?

4 Answers4