My question is the exact opposite of this one.
This is an excerpt from my test file
f1 = open('seed1234','r')
f2 = open('seed7883','r')
s1 = eval(f1.read())
s2 = eval(f2.read())
f1.close()
f2.close()
####
test_sampler1.random_inst.setstate(s1)
out1 = test_sampler1.run()
self.assertEqual(out1,self.out1_regress) # this is fine and passes
test_sampler2.random_inst.setstate(s2)
out2 = test_sampler2.run()
self.assertEqual(out2,self.out2_regress) # this FAILS
Some info -
test_sampler1
and test_sampler2
are 2 object from a class that performs some stochastic sampling. The class has an attribute random_inst
which is an object of type random.Random()
. The file seed1234
contains a TestSampler
's random_inst
's state as returned by random.getstate()
when it was given a seed of 1234
and you can guess what seed7883
is. What I did was I created a TestSampler
in the terminal, gave it a random seed of 1234
, acquired the state with rand_inst.getstate()
and save it to a file. I then recreate the regression test and I always get the same output.
HOWEVER
The same procedure as above doesn't work for test_sampler2
- whatever I do not get the same random sequence of numbers. I am using python's random
module and I am not importing it anywhere else, but I do use numpy
in some places (but not numpy.random
).
The only difference between test_sampler1
and test_sampler2
is that they are created from 2 different files. I know this is a big deal and it is totally dependent on the code I wrote but I also can't simply paste ~800 lines of code here, I am merely looking for some general idea of what I might be messing up...
What might be scrambling the state of test_sampler2
's random number generator?
Solution
There were 2 separate issues with my code:
1
My script is a command line script and after I refactored it to use python's optparse
library I found out that I was setting the seed for my sampler using something like seed = sys.argv[1]
which meant that I was setting the seed to be a str
, not an int
- seed
can take any hashable object and I found it the hard way. This explains why I would get 2 different sequences if I used the same seed - one if I run my script from the command line with sth like python sample 1234 #seed is 1234
and from my unit_tests.py
file when I would create an object instance like test_sampler1 = TestSampler(seed=1234)
.
2
I have a function for discrete distribution sampling which I borrowed from here (look at the accepted answer). The code there was missing something fundamental: it was still non-deterministic in the sense that if you give it the same values and probabilities array, but transformed by a permutation (say values ['a','b']
and probs [0.1,0.9]
and values ['b','a']
and probabilities [0.9,0.1]
) and the seed is set and you will get the same random sample, say 0.3
, by the PRNG, but since the intervals for your probabilities are different, in one case you'll get a b
and in one an a
. To fix it, I just zipped the values and probabilities together, sorted by probability and tadaa - I now always get the same probability intervals.
After fixing both issues the code worked as expected i.e. out2 started behaving deterministically.