0

My question is the exact opposite of this one.

This is an excerpt from my test file

f1 = open('seed1234','r')
f2 = open('seed7883','r')
s1 = eval(f1.read())
s2 = eval(f2.read())
f1.close()
f2.close()
####
test_sampler1.random_inst.setstate(s1)
out1 = test_sampler1.run()
self.assertEqual(out1,self.out1_regress) # this is fine and passes

test_sampler2.random_inst.setstate(s2)
out2 = test_sampler2.run()
self.assertEqual(out2,self.out2_regress) # this FAILS

Some info -

test_sampler1 and test_sampler2 are 2 object from a class that performs some stochastic sampling. The class has an attribute random_inst which is an object of type random.Random(). The file seed1234 contains a TestSampler's random_inst's state as returned by random.getstate() when it was given a seed of 1234 and you can guess what seed7883 is. What I did was I created a TestSampler in the terminal, gave it a random seed of 1234, acquired the state with rand_inst.getstate() and save it to a file. I then recreate the regression test and I always get the same output.

HOWEVER

The same procedure as above doesn't work for test_sampler2 - whatever I do not get the same random sequence of numbers. I am using python's random module and I am not importing it anywhere else, but I do use numpy in some places (but not numpy.random).

The only difference between test_sampler1 and test_sampler2 is that they are created from 2 different files. I know this is a big deal and it is totally dependent on the code I wrote but I also can't simply paste ~800 lines of code here, I am merely looking for some general idea of what I might be messing up...

What might be scrambling the state of test_sampler2's random number generator?

Solution

There were 2 separate issues with my code:

1

My script is a command line script and after I refactored it to use python's optparse library I found out that I was setting the seed for my sampler using something like seed = sys.argv[1] which meant that I was setting the seed to be a str, not an int - seed can take any hashable object and I found it the hard way. This explains why I would get 2 different sequences if I used the same seed - one if I run my script from the command line with sth like python sample 1234 #seed is 1234 and from my unit_tests.py file when I would create an object instance like test_sampler1 = TestSampler(seed=1234).

2

I have a function for discrete distribution sampling which I borrowed from here (look at the accepted answer). The code there was missing something fundamental: it was still non-deterministic in the sense that if you give it the same values and probabilities array, but transformed by a permutation (say values ['a','b'] and probs [0.1,0.9] and values ['b','a'] and probabilities [0.9,0.1]) and the seed is set and you will get the same random sample, say 0.3, by the PRNG, but since the intervals for your probabilities are different, in one case you'll get a b and in one an a. To fix it, I just zipped the values and probabilities together, sorted by probability and tadaa - I now always get the same probability intervals.

After fixing both issues the code worked as expected i.e. out2 started behaving deterministically.

Community
  • 1
  • 1
baibo
  • 448
  • 4
  • 20
  • I would accept feedback on the downvote as well... What is wrong with this question? – baibo Dec 10 '13 at 00:25
  • Why not use `pickle.dump(random_inst, f1)` and `random_inst = pickle.load(f1)` instead of `eval`? – John La Rooy Dec 10 '13 at 00:26
  • So you are saying that loading the same seed file over and over gives a different sequence? What version of Python are you using? – John La Rooy Dec 10 '13 at 00:28
  • 2.6.8 . I'll try with `pickle` then. – baibo Dec 10 '13 at 00:31
  • You can also just use `random_inst.seed(7883)` etc. – John La Rooy Dec 10 '13 at 00:35
  • Same story with `pickle`,and on top of that, if I use `random_ist.seed(7883)` even my first test fails - setting the seed only isn't enough, but I thought that setting the state would be... – baibo Dec 10 '13 at 00:42
  • Where does `out2_regress` come from? How was it generated? Are you sure it's correct? – user2357112 Dec 10 '13 at 00:52
  • What I did was I created a `TestSampler` in the terminal, gave it a random seed of 1234 and recorded the output - that is where it comes from. I have inspected it and it does seem like it is correct. On top of that, I actully **do** get the same output if I do it from the terminal - the code only fails in the tests... – baibo Dec 10 '13 at 00:56
  • About #2, you should really sort on the full list of `(value, probability)` pairs (or `(probability, value)` pairs). That is, you need to impose a total order on the pairs to get the same results across all permutations. If you just sort on probability then, e.g., the 24 permutations of `[(.25, 1.), (.25, 2.), (.25, 3.), (.25, 4.)]` on input would lead to 24 different sequences of results. – Tim Peters Dec 11 '13 at 22:17
  • Thanks, I didn't think about that! The `values` in my case are objects from 5 different classes that aren't really comparable to each other, so then I might as well just assign them an arbitrary order and sort by that. – baibo Dec 12 '13 at 11:31

1 Answers1

1

The only thing (apart from an internal Python bug) that can change the state of a random.Random instance is calling methods on that instance. So the problem lies in something you haven't shown us. Here's a little test program:

from random import Random

r1 = Random()
r2 = Random()

for _ in range(100):
    r1.random()
for _ in range(200):
    r2.random()

r1state = r1.getstate()
r2state = r2.getstate()

with open("r1state", "w") as f:
    print >> f, r1state
with open("r2state", "w") as f:
    print >> f, r2state


for _ in range(100):
    with open("r1state") as f:
        r1.setstate(eval(f.read()))
    with open("r2state") as f:
        r2.setstate(eval(f.read()))
    assert r1state == r1.getstate()
    assert r2state == r2.getstate()

I haven't run that all day, but I bet I could and never see a failing assert ;-)

BTW, it's certainly more common to use pickle for this kind of thing, but it's not going to solve your real problem. The problem is not in getting or setting the state. The problem is that something you haven't yet found is calling methods on your random.Random instance(s).

While it's a major pain in the butt to do so, you could try adding print statements to random.py to find out what's doing it. There are cleverer ways to do that, but better to keep it dirt simple so that you don't end up actually debugging the debugging code.

Tim Peters
  • 67,464
  • 13
  • 126
  • 132
  • when I'm looking through the code of `random.py` is seems to be using the Wichman-Hill random number generator but the docs say http://docs.python.org/2.6/library/random.html something else... – baibo Dec 10 '13 at 01:14
  • You must be looking at class `WichmannHill`, which you're not using. Class `Random` is what you *are* using, and that inherits from `_random.Random`. In Python 2.6, that in turn is defined by C code in `Modules/_randommodule.c`, which file also contains the Mersenne Twister implementation. – Tim Peters Dec 10 '13 at 01:25
  • I found the error and you were right - it was in something that I wasn't showing you and, in fact, it wasn't a fault of my random seed setting. I'll update my post with a solution and code of the faulty method. – baibo Dec 10 '13 at 18:34
  • 1
    @Ivanov, congratulations! I hope you do update the post - I'd like to find out what really happened here. – Tim Peters Dec 11 '13 at 04:29