6

I have the task of porting some python code to Scala for research purposes. Now I use the Apache Math3 commons library and am having difficulty with the MersenneTwister.

In Python:

SEED = 1234567890

PRIMARY_RNG = random.Random()
PRIMARY_RNG.seed(SEED)
n = PRIMARY_RNG.randrange((2**31) - 1) #1977150888

In Scala:

val Seed = 1234567890
val PrimaryRNG = new MersenneTwister(Seed)
val n = PrimaryRNG.nextInt(Int.MaxValue) //1328851649

What am I missing here? Both are MersenneTwister's,
and Int.MaxValue = 2147483647 = (2**31) - 1

WiR3D
  • 1,465
  • 20
  • 23
  • 2
    genuinely curious - why is MersenneTwister better than UUID generated by Java util? –  Aug 04 '14 at 20:37
  • Good question - in the python source they say its the best, honestly though its out of the scope of the question, since I am replicating results. But I'm also curious – WiR3D Aug 04 '14 at 20:59
  • 1
    has you also tried colt just to see the difference between these implementations? Also, since its a random number generator (unless I misunderstood), why should the values not be different? –  Aug 04 '14 at 21:04
  • Because its a Pseudo random number generator, so if you feed it a seed and the first call will always result in the same number. Also I am using Apache Math3 for a number of other functions and including 2 math libraries is a bit of a waste, also afaik Math3 is pretty comprehensive. – WiR3D Aug 04 '14 at 21:22
  • 1
    Looking through the C code that python uses [here](http://svn.python.org/projects/python/trunk/Modules/_randommodule.c) and the Apache Math code [here](https://apache.googlesource.com/commons-math/+/48b1e6a0fb7d6dc3097a7edda3065d3c04684d20/src/main/java/org/apache/commons/math3/random/MersenneTwister.java) it seems that the Python equivalent of `MersenneTwister.next(bits)` is the `generate_int32(...)` method and the two look pretty much identical however the rest of the implementations are somewhat different. – SamYonnou Aug 04 '14 at 22:30
  • @SamYonnou now if only that was reversed. – WiR3D Aug 04 '14 at 22:47

3 Answers3

3

Apache Commons Math apparently uses an integer as the base source of randomness, though I'm not quite sure how it extracts it, while Python uses the double generated by a C version of the algorithm.

There may also be differences in how the seed values are processed, but since they don't even read out the bits in the same way, one wouldn't expect them to be comparable even if the underlying pseudorandom generator is the same.

Rex Kerr
  • 166,841
  • 26
  • 322
  • 407
2

As I have already posted in the comments the main algorithm to get the next integer is the same between Python and Apache Math (source code here, here, and here). Tracing through the code it seems that the main difference is in how the two versions seed the generator. The Python version will convert the given seed into an array and seed from the array while the Apache Math version has a separate algorithm for seeding from a single number. Thus in order to get the Apache Math nextInt(...) method to act in the save way as the Python randrange(...) method you should seed the Apache Math version with an array.

(I don't know Scala so the following code is in Java)

MersenneTwister rng = new MersenneTwister();
rng.setSeed(new int[] {1234567890});
System.out.println(rng.nextInt(Integer.MAX_VALUE)); // 1977150888

Note also that all of the other methods such as random() vs. nextDouble() are completely different so this seeding mechanism will probably only work to make nextInt(...) and randrange(...) to return the same results.

SamYonnou
  • 2,068
  • 1
  • 19
  • 23
0

In case anyone needs to do this, I came up with a working version based on the CPython implementation here.

Note: If you seed with a string, random.seed() changed between Python 2 and 3. The pythonStringHash function here is compatible with the Python 2 version, or in Python 3, random.seed(s, version=1).

private static long pythonStringHash(String s) {
  char[] chars = s.toCharArray();
  long x;
  if (s.isEmpty()) {
    x = 0;
  } else {
    x = chars[0] << 7;
  }

  for (char c : chars) {
    x = ((1000003 * x) ^ c);
  }

  x ^= chars.length;
  if (x == -1) {
    return -2;
  }
  return x;
}

private static void pythonSeed(MersenneTwister random, long seed) {
  int[] intArray;
  if (Long.numberOfLeadingZeros(seed) >= 32) {
    intArray = new int[] { (int) seed };
  } else {
    intArray = new int[] { (int) seed, (int) (seed >> 32) };
  }
  random.setSeed(intArray);
}

public static RandomGenerator pythonSeededRandom(String seed) {
  MersenneTwister random = new MersenneTwister();
  pythonSeed(random, pythonStringHash(seed));
  return random;
}

From there, pythonSeededRandom("foo").nextDouble() should be equal to random.seed("foo"); random.random().