How can I use reproducible randomization in Perl?

Question

I have a Perl script that uses rand to generate pseudorandom integers in some range. I want it to be random (i.e. not set the seed by myself to some constant), but also want to be able to reproduce the results of a specific run if needed.

What would you do?

Related question: http://stackoverflow.com/questions/2360123/how-can-i-store-the-state-of-the-pseudo-random-generator-in-perl As of 5.13.4, `srand` returns the seed: http://blog.nu42.com/2010/10/perls-srand-now-returns-seed.html — Sinan Ünür, Nov 04 '10 at 20:25
@brian Reproducibility of pseudorandom sequences used in simulations is essential to verifying results. — Sinan Ünür, Nov 04 '10 at 20:51
You tack on that "pseudo" there. I didn't. I figure he's trying to verify results, but I'm still hoping one day that this poster will get past his persistent XY Problem. — brian d foy, Nov 04 '10 at 21:03
IIRC the default is `srand(time())`. Why do you not want to do that? You can't have your cake and eat it too. — Fozi, Nov 04 '10 at 21:12
No, the default is not srand(time()). It was that many, many years ago in Perl 5.004, over 12 years ago. Maybe someday people will read the documentation that we write. :( — brian d foy, Nov 04 '10 at 21:58
I cover the new brand in The Effective Perler ([Use the return value from srand](http://www.effectiveperlprogramming.com/2010/10/use-the-return-value-from-srand/)). This comment updates an older comment for the new URL. — brian d foy, Sep 28 '15 at 15:34

score 2 · Answer 1 · answered Nov 04 '10 at 19:15

McWafflestix says:

Possibly you want to have a default randomly determined seed, that will give you complete randomness when desired, but which can be set prior to a run manually to give reproducibility.

The obvious way to implement this is to follow your normal seeding process (either manually from a strong random source, or letting perl do it automatically on the first call to rand), then use the first generated random value as the seed, and record it. If you want to reproduce later, just use a recorded value for the seed.

# something like this?

if ( defined $input_rand_seed ) {
    srand($input_rand_seed);
} else {
    my $seed = rand();   # or something fancier
    log_random_seed($seed);
    srand($seed);
}

@McWafflestix: Sorry, maybe I should've done this as a comment? If you feel like I'm stealing your thunder, you're welcome to add a little into your answer and I'll delete mine. — Cascabel, Nov 04 '10 at 19:31
No, no worries whatsoever; I don't mind this being an answer as well. I was just pointing out that I agreed with your answer, and in fact, I'm sure the clarification is appreciated. — Paul Sonier, Nov 04 '10 at 19:33

score 1 · Answer 2 · answered Nov 04 '10 at 19:10

1

Log the seed for each run and provide a method to call the script and set the seed?

answered Nov 04 '10 at 19:10

Oesor

6,632
2
29
56

I'm not aware of a way to get at the seed, unless it's one you chose yourself - and the OP said he doesn't want to do that. – Cascabel Nov 04 '10 at 19:13
1

Heh, I'd never actually done it and assumed it was possible. A quick google search turns up http://www.perlmonks.org/?node_id=716343 though, which is the same discussion. It recommends using $seed = int(rand(2**31));, then using srand to seed rand, allowing randomness while logging the seed. – Oesor Nov 04 '10 at 19:16
1

perl 5.14 is supposed to add the ability to call `srand` with an empty list and it will return the value it chose to set for the seed. – Ven'Tatsu Nov 04 '10 at 19:24

brian d foy · Answer 3 · 2010-11-04T21:05:06.987

Why don't you want to set the seed, but at the same time set the seed? As I've said to you before, you need to explain why you don't want to do something so we know what you are actually asking.

You might just set it yourself only in certain conditions:

srand( $ENV{SOME_SEED} ) if defined $ENV{SOME_SEED};

If you don't call srand, rand calls it for you automatically but it doesn't report the seed that it used (at least not until Perl 5.14).

It's really just a simple programming problem. Just turn what you outlined into the code that does what you said.

Sinan Ünür · Accepted Answer · 2010-11-04T21:08:34.297

If the purpose is to be able to reproduce simulation paths which incorporate random shocks (say, when you are running an economic model to produce projections, I would give up on the idea of storing the seed, but rather store each sequence alongside the model data.

Note that the built in rand is subject to vagaries of the rand implementation provided by the C runtime. On all Windows machines and across all perl versions I have used, this usually means that rand will only ever produce 32768 unique values.

That is severely limited for any serious purpose. In simulations, a crucial criterion is that random sequences used be independent of each other so that each run can be considered an independent realization.

In fact, if you are going to run a simulation 1,000 times, I would pre-produce 1,000 corresponding random sequences using known-good generators that are consistent across platforms and store them with the model inputs.

You can update the simulations using the same sequences or a new set if parameter estimates change when you get new data.

I'm finally using [Math::Random::MT::Auto ](http://search.cpan.org/perldoc?Math%3a%3aRandom%3a%3aMT%3a%3aAuto) following this [answer](http://stackoverflow.com/questions/2360123/how-can-i-store-the-state-of-the-pseudo-random-generator-in-perl/2360181#2360181). Thanks. — David B, Nov 05 '10 at 16:24
The Windows rand entropy problem has been fixed in later versions of Perl. (See https://rt.perl.org/Public/Bug/Display.html?id=115928.) That being said, if you really care about good randomization, it's best to use a module. — SineSwiper, Feb 04 '14 at 16:01

score 0 · Answer 5 · edited Nov 04 '10 at 22:19

0

Your goals are at odds with each other. One one hand, you want a self-seeding, completely random sequence of integers; on the other hand, you want reproducibility. Completely random and reproducibility are at odds with each other.

You can set the seed to something you want. Possibly you want to have a default randomly determined seed, that will give you complete randomness when desired, but which can be set prior to a run manually to give reproducibility.

edited Nov 04 '10 at 22:19

brian d foy

129,424
31
207
592

answered Nov 04 '10 at 19:12

Paul Sonier

38,903
3
77
117

That's not the only way. You can just use the same lookup table of pre-computed numbers. Those pre-computed numbers don't even have to come from a pseudorandom number generator. – brian d foy Nov 04 '10 at 20:43
1

@briandfoy: a lookup table of pre-computed numbers is not, by definition, random and unseeded; the numbers may be random, but the population of the list with them is a seeding. – Paul Sonier Nov 04 '10 at 20:46
I don't know what definition you think you have, but if I keep a list of random numbers as they come in, that's a list of random numbers. There is no seed because there is no function that created them based on previous values. You might be talking about fake random numbers. I'm talking about actual random numbers. – brian d foy Nov 04 '10 at 20:51
@Briandfoy: You're correct that if you keep a list of random numbers as they come in, they are random numbers. Those numbers will continue to be random numbers for the end of time; however, if you use them repeatedly, that doesn't really fit the qualification of "random" that most people consider. You're really talking about the definition of a "one-time pad", which is indeed random; note the "one-time" in the name, though! You can repeat the usage, but for most needs of "randomness", repeated usage of a random series isn't really what's desired. – Paul Sonier Nov 04 '10 at 20:57
That is why I said in other comments that reproducibility and randomness don't go together. I'm only commenting to you because you said "only way", which is almost always wrong. I know about one-time pads just fine, and that's how you should use them. In this case, for some unexplained reason, the poster wants to reuse them. You're preaching to the choir. – brian d foy Nov 04 '10 at 20:59
As for what most people consider as "random", I think you merely mean computer programmers who haven't been taught correctly. While that proportion might be most computer programmers, it's a vanishingly small part of "most people". – brian d foy Nov 04 '10 at 21:01
@briandfoy: I really think you're missing the point. When a computer programmer says that they want a random number for some reason, they aren't saying that they want a number which was generated randomly, since that number can be, like you described, persisted and reused; what they (usually) want is a number which is randomly dissociated from numbers which may have been used in prior runs of the software in question. You're arguing semantics over what defines "random", and completely missing the context that this is a coding question. – Paul Sonier Nov 04 '10 at 21:11
I think that you're missing my point. We do a disservice to people when we pretend and mislabel one thing for another. That's when people get confused. In those cases, we have to be very careful with how we present things. And, there is no context in this question. It's devoid of context. As a programmer, when I've needed to deal with random numbers, they actually had to be random. When you say "computer programmer", you're assuming that everyone's needs are the same. They aren't. That's why you shouldn't say "only way". It's a lie that perpetuates everyone's confusion. – brian d foy Nov 04 '10 at 21:15
@briandfoy: I'm getting your point; it's pedanticism, and it's just plain annoying. It's not mislabeling; it's generalization. We're not devoid of context here; stackoverflow is a site for programming questions, not general questions. In the context of coding software, most coders understand that "random" numbers aren't truly random, but that they're pseudorandom. A solid state disk isn't actually a DISK at all, etc. Arguing the literal meaning of words without comprehending the implied context is just being pedantic and argumentative. – Paul Sonier Nov 04 '10 at 21:22
This question has no context. We don't know what or why he's asking it. I don't really think you can make the assertion that coders understand random numbers. That hasn't been my experience. Worse yet, people who think they understand them have a very tough time explaining them to others. You persist in asserting that all programmers only use pseudorandom numbers. It just ain't so. – brian d foy Nov 04 '10 at 21:32
@briandfoy: of course the question has context. Just like if I ask about water quality on a cooking site, the context will be in the context of cooking with water, and if I ask about water quality on a farming site, the context will be about using the water to grow things, when people ask a question on here, they're asking about CODING ISSUES. And finally, I don't think you understand random numbers yourself; by definition, all programmers ONLY use pseudorandom numbers (with the possible exception of folks with some VERY esoteric hardware). – Paul Sonier Nov 04 '10 at 21:38
Again with the categorical statements. Not all programmers use pseudorandom numbers. I was one with the not-so esoteric hardware that took numbers from the timings of radioactive decay. A lot of people I've work with also use such hardware. As for context with this question, he hasn't told use what he's trying to do or why this matters. Still you have the mistaken assumptions. And, SSD is solid state *drive*, not *disk*. It would have been so much easier for you to edit "the only way" to something not categorical and avoid all the hassle you find so annoying. – brian d foy Nov 04 '10 at 21:55
You keep saying "by definition", but that doesn't mean what you think it means. – brian d foy Nov 04 '10 at 21:56
@briandfoy: pedanticism, through and through. Sure, you're an exception; I'd still contend, however, that hardware that takes numbers from the timings of radioactive decay is, in fact, esoteric. (Do you know what that word means?) As for context, I don't necessarily need to know what the OP is using the random numbers for; it's something on computers, I'm pretty sure of that, and it's something using perl, and from that, I'm pretty sure I've got some context. And by the way, no, I won't edit my posts to satisfy the "categorical statement police". You've got control issues; get over them. – Paul Sonier Nov 04 '10 at 22:08
@briandfoy: oh yes, about the definition of random vs. pseudorandom. A random number is objectively unpredictable; by accumulating numbers in a one-time pad (or any other means, for that matter!) you're not making them objectively unpredictable; rather, it's making them predictable. This is fine for most applications, since the numbers APPEAR random, but they're not random, since their relationship with each other is defined (and thereby predictable). However, for use multiple times, those numbers are very precisely NOT random, in any sense of the word, no matter how they were derived. – Paul Sonier Nov 04 '10 at 22:16
As always, it just comes down to personal attacks when they have no real argument. – brian d foy Nov 04 '10 at 22:18
@briandfoy: what personal attacks? and for that matter, who's "they"? :-) – Paul Sonier Nov 04 '10 at 22:52

How can I use reproducible randomization in Perl?

5 Answers5