Python numpy - Reproducibility of random numbers

Question

We have a very simple program (single-threaded) where we we do a bunch of random sample generation. For this we are using several calls of the numpy random functions (like normal or random_sample). Sometimes the result of one random call determines the number of times another random function is called.

Now I want to set a seed in the beginning s.th. multiple runs of my program should yield the same result. For this I'm using an instance of the numpy class RandomState. While this is the case in the beginning, at some time the results become different and this is why I'm wondering.

When I am doing everything correctly, having no concurrency and thereby a linear call of the functions AND no other random number generator involded, why does it not work?

Show us *code*! Without a minimal example that demonstrates your problem it's highly unlikely that we can be useful! — Bakuriu, Apr 25 '13 at 17:07
You are not doing it correctly. The PRNGs in numpy are known to be good. If you want us to believe you, provide a program that seeds the PRNG and then emits different output on different runs. Otherwise, it didn't happen. — David Heffernan, Apr 25 '13 at 20:20

score 5 · Accepted Answer · answered Apr 30 '13 at 09:24

Okay, David was right. The PRNGs in numpy work correctly. Throughout every minimal example I created, they worked as they are supposed to.

My problem was a different one, but finally I solved it. Do never loop over a dictionary within a deterministic algorithm. It seems that Python orders the items arbitrarily when calling the .item() function for getting in iterator.

So I am not that disappointed that this was this kind of error, because it is a useful reminder of what to think about when trying to do reproducible simulations.

Nowadays you can deterministically loop over dictionaries. "Changed in version 3.7: Dictionary order is guaranteed to be insertion order." Quoted from https://docs.python.org/3/library/stdtypes.html?highlight=guaranteed#mapping-types-dict — wjakobw, Oct 14 '21 at 08:06

score -1 · Answer 2 · answered Apr 25 '13 at 19:48

-1

If reproducibility is very important to you, I'm not sure I'd fully trust any PRNG to always produce the same output given the same seed. You might consider capturing the random numbers in one phase, saving them for reuse; then in a second phase, replay the random numbers you've captured. That's the only way to eliminate the possibility of non-reproducibility -- and it solves your current problem too.

answered Apr 25 '13 at 19:48

Chris Johnson

20,650
6
81
80

1

If a PRNG does not produce the same output for the same seed it is broken. – David Heffernan Apr 25 '13 at 20:19
1

Right but it could in fact be broken. Or it could not port consistently between OSs. Or it might have different results in Python 3. Or or or. My point is if repeatability is important, you can guarantee it via a record & playback approach. – Chris Johnson Apr 26 '13 at 02:53
1

The PRNG is known to be good and there's no suggestion that this is anything other than single Python version, single machine. – David Heffernan Apr 26 '13 at 06:04

Python numpy - Reproducibility of random numbers

2 Answers2