5

Simplified (i.e., leaving concurrency out) Random.next(int bits) looks like

protected int next(int bits) {
    seed = (seed * multiplier + addend) & mask;
    return (int) (seed >>> (48 - bits));
}

where masking gets used to reduce the seed to 48 bits. Why is it better than just

protected int next(int bits) {
    seed = seed * multiplier + addend;
    return (int) (seed >>> (64 - bits));
}

? I've read quite a lot about random numbers, but see no reason for this.

maaartinus
  • 44,714
  • 32
  • 161
  • 320

4 Answers4

5

The reason is that the lower bits tend to have a lower period (at least with the algorithm Java uses)

From Wikipedia - Linear Congruential Generator:

As shown above, LCG's do not always use all of the bits in the values they produce. The Java implementation produces 48 bits with each iteration but only returns the 32 most significant bits from these values. This is because the higher-order bits have longer periods than the lower order bits (see below). LCG's that use this technique produce much better values than those that do not.

edit:

after further reading (conveniently, on Wikipedia), the values of a, c, and m must satisfy these conditions to have the full period:

  1. c and m must be relatively primes

  2. a-1 is divisible by all prime factors of m

  3. a-1 is a multiple of 4 if m is a multiple of 4

The only one that I can clearly tell is still satisfied is #3. #1 and #2 need to be checked, and I have a feeling that one (or both) of these fail.

Michelle Tilley
  • 157,729
  • 40
  • 374
  • 311
helloworld922
  • 10,801
  • 5
  • 48
  • 85
  • I know that the higher bits are indeed better, but this is actually a reason for *not* using the mask. The chosen multiplier and addend lead to the maximal period, which means that the period without masking would be 2**64 instead of 2**48. – maaartinus Apr 08 '11 at 00:01
  • @maartinus: you can't just change 48 to 64; it changes the properties of the generator. – Jason S Apr 08 '11 at 00:26
  • @Jason S Sure it does, but does it worsen anything? – maaartinus Apr 08 '11 at 00:33
  • If you don't change the other coefficients it ruins the generator. I'm not sure what the period of the resultant generator would be, but I suspect you might get a period of less than 248 -- it would take some careful mathematical analysis to show what it is. – Jason S Apr 08 '11 at 00:39
  • 1
    @maaartinus: As mentioned in the [article cited](http://en.wikipedia.org/wiki/Linear_congruential_generator), the period of the Java LCG is _at most_ m = 2**48. @helloworld922: +1 This LCG has indeed been [tested](http://www.math.utah.edu/~beebe/java/random/README) and the low order bits have the expected poorer quality. – trashgod Apr 08 '11 at 00:40
  • @helloworld922 Conditions 1 and 2 are trivially satisfied as `m` is a power of two and both `c` and `a` are odd. @trashgod The period obviously can't be greater than two powered to the bitlength, but with the conditions given there it's equal. – maaartinus Apr 08 '11 at 00:54
  • OK, so that means the period is 2^64, but it says nothing about the statistical properties of the various bits of the random number generator state. – Jason S Apr 08 '11 at 01:15
  • @Jason S Right, but there's a good reason to believe that it would return better quality numbers: The low order bits are obviously worse and with the mask you use bits 16 to 47, without it you'd use bits 32 to 63. – maaartinus Apr 14 '11 at 17:44
2

From the docs at the top of java.util.Random:

  • The algorithm is described in The Art of Computer Programming,
  • Volume 2 by Donald Knuth in Section 3.2.1. It is a 48-bit seed,
  • linear congruential formula.

So the entire algorithm is designed to operate of 48-bit seeds, not 64 bit ones. I guess you can take it up with Mr. Knuth ;p

jberg
  • 4,758
  • 2
  • 20
  • 15
  • I see, but the formula works for any bitlength, and the book is very old. The 48-bit seed may have been chosen by Knuth thinking about 8-bit computers. – maaartinus Apr 07 '11 at 23:58
  • 'the book is very old' -- heh, math doesn't change. – Jason S Apr 08 '11 at 00:26
  • @Jason S Math doesn't change *much*, but its usage does as the computers get faster. – maaartinus Apr 08 '11 at 00:43
  • There are other values that need to be changed in the algorithm, you cannot simply swap in a different number of bits. The a and c values in the algorithm need to be adjusted. – jberg Apr 08 '11 at 00:55
  • No, `a` and `c` may stay unchanged, see my answer to helloworld922. – maaartinus Apr 08 '11 at 01:00
0

From wikipedia (the quote alluded to by the quote that @helloworld922 posted):

A further problem of LCGs is that the lower-order bits of the generated sequence have a far shorter period than the sequence as a whole if m is set to a power of 2. In general, the nth least significant digit in the base b representation of the output sequence, where bk = m for some integer k, repeats with at most period bn.

And furthermore, it continues (my italics):

The low-order bits of LCGs when m is a power of 2 should never be relied on for any degree of randomness whatsoever. Indeed, simply substituting 2n for the modulus term reveals that the low order bits go through very short cycles. In particular, any full-cycle LCG when m is a power of 2 will produce alternately odd and even results.

In the end, the reason is probably historical: the folks at Sun wanted something to work reliably, and the Knuth formula gave 32 significant bits. Note that the java.util.Random API says this (my italics):

If two instances of Random are created with the same seed, and the same sequence of method calls is made for each, they will generate and return identical sequences of numbers. In order to guarantee this property, particular algorithms are specified for the class Random. Java implementations must use all the algorithms shown here for the class Random, for the sake of absolute portability of Java code. However, subclasses of class Random are permitted to use other algorithms, so long as they adhere to the general contracts for all the methods.

So we're stuck with it as a reference implementation. However that doesn't mean you can't use another generator (and subclass Random or create a new class):

from the same Wikipedia page:

MMIX by Donald Knuth m=264 a=6364136223846793005 c=1442695040888963407

There's a 64-bit formula for you.

Random numbers are tricky (as Knuth notes) and depending on your needs, you might be fine with just calling java.util.Random twice and concatenating the bits if you need a 64-bit number. If you really care about the statistical properties, use something like Mersenne Twister, or if you care about information leakage / unpredictability use java.security.SecureRandom.

Jason S
  • 184,598
  • 164
  • 608
  • 970
  • You're wrong with your warning concerning the signedness. As long as no arithmetic with *more than 64* bits is involved, the signedness is irrelevant also for multiplication. For a 64-bit number, there's `Random.nextLong()`. – maaartinus Apr 08 '11 at 00:58
  • Ah -- my mistake, you are correct. (proof: (a-M)*(b-M) mod M = a*b where a and b are the unsigned numbers and (a-M) and (b-M) are the signed numbers with M = 2^64) I'll delete that paragraph. – Jason S Apr 08 '11 at 01:07
0

It doesn't look like there was a good reason for doing this. Applying the mask is an conservative approach using a proven design. Leaving it out most probably leads to a better generator, however, without knowing the math well, it's a risky step.

Another small advantage of masking is a speed gain on 8-bit architectures, since it uses 6 bytes instead of 8.

maaartinus
  • 44,714
  • 32
  • 161
  • 320