Looking for decent-quality PRNG with only 32 bits of state

Question

I'm trying to implement a tolerable-quality version of the rand_r interface, which has the unfortunate interface requirement that its entire state is stored in a single object of type unsigned, which for my purposes means exactly 32 bits. In addition, I need its output range to be [0,2³¹-1]. The standard solution is using a LCG and dropping the low bit (which has the shortest period), but this still leaves very poor periods for the next few bits.

My initial thought was to use two or three iterations of the LCG to generate the high/low or high/mid/low bits of the output. However, such an approach does not preserve the non-biased distribution; rather than each output value having equal frequency, many occur multiple times, and some never occur at all.

Since there are only 32 bits of state, the period of the PRNG is bounded by 2³², and in order to be non-biased, the PRNG must output each value exactly twice if it has full period or exactly once if it has period 2³¹. Shorter periods cannot be non-biased.

Is there any good known PRNG algorithm that meets these criteria?

Pseudo-random is not random. And the opposite of "guaranteed output" is, unfortunately, "biased". For insanely large periods like 2^100 having a perfect output distribution is not really necessary as long as it's empirically indistinguishable from uniform, but for a small period like 2^32, non-uniformity will be a clear observable bias. — R.. GitHub STOP HELPING ICE, Jun 11 '13 at 02:53
Yes you can predict the next value of a PRNG with period 2^32 if you have roughly 0.5 GB of data on the previous values it returned. That's to be expected, and it would be the case even if the distribution weren't uniform. If there are exactly 2^32 states and 2^32 outputs and you've seen 2^32-1 of them, the last one is always determined. — R.. GitHub STOP HELPING ICE, Jun 11 '13 at 02:59
I deleted my comment. I didn't quite understand what you were saying, but I think I got it now. — xaxxon, Jun 11 '13 at 02:59
How about using LCG, and then applying a safe 1:1 transform (I've suggested some [here](http://stackoverflow.com/a/16745271/2417578)) to improve the distribution? This is similar, in principle, to running a cryptographic algorithm in counter mode. — sh1, Jun 11 '13 at 18:42
I know you say 32-bit pretty plainly, here, but if you're totally nailed down on an appropriate platform, could you simply store a pointer to a `malloc()` of any practical size? Or an index into a static table of pointers? — sh1, Jun 11 '13 at 18:49
@sh1: Nice try, but no, the interface requirement is that the state is fully stored in the `unsigned int` object. Transforming the LCG or LFSR output with a 1:1 map sounds more promising though. — R.. GitHub STOP HELPING ICE, Jun 12 '13 at 00:43

Ilmari Karonen · Answer 1 · 2013-06-11T19:20:24.627

4

One good (but probably not the fastest) possibility, offering very high quality, would be to use a 32-bit block cipher in CTR mode. Basically, your RNG state would simply be a 32-bit counter that gets incremented by one for each RNG call, and the output would be the encryption of that counter value using the block cipher with some arbitrarily chosen fixed key. For extra randomness, you could even provide a (non-standard) function to let the user set a custom key.

There aren't a lot of 32-bit block ciphers in common use, since such a short block size introduces problems for cryptographic use. (Basically, the birthday paradox lets you distinguish the output of such a cipher from a random function with a non-negligible probability after only about 2¹⁶ = 65536 outputs, and after 2³² outputs the non-randomness obviously becomes certain.) However, some ciphers with an adjustable block size, such as ~~XXTEA~~ or HPC, will let you go down to 32 bits, and should be suitable for your purposes.

(Edit: My bad, XXTEA only goes down to 64 bits. However, as suggested by CodesInChaos in the comments, Skip32 might be another option. Or you could build your own 32-bit Feistel cipher.)

The CTR mode construction guarantees that the RNG will have a full period of 2³² outputs, while the standard security claim of (non-broken) block ciphers is essentially that it is not computationally feasible to distinguish their output from a random permutation of the set of 32-bit integers. (Of course, as noted above, such a permutation is still easily distinguished from a random function taking 32-bit values.)

Using CTR mode also provides some extra features you may find convenient (even if they're not part of the official API you're developing against), such as the ability to quickly seek into any point in the RNG output stream just by adding or subtracting from the state.

On the other hand, you probably don't want to follow the common practice of seeding the RNG by just setting the internal state to the seed value, since that would cause the output streams generated from nearby seeds to be highly similar (basically just the same stream shifted by the difference of the seeds). One way to avoid this issue would be to add an extra encryption step to the seeding process, i.e. to encrypt the seed with the cipher and set the internal counter value equal to the result.

edited Jun 11 '13 at 19:20

answered Jun 11 '13 at 03:32

Ilmari Karonen

49,047
9
93
153

The problem with this approach is ensuring full period. Obviously it could be tested empirically given about a day of computing time, but I see no obvious reason to believe that applying a cryptographic cipher fewer than 2³² times won't give back the original value. – R.. GitHub STOP HELPING ICE Jun 11 '13 at 03:36
The CTR mode construction explicitly guarantees full period: the output stream will only repeat when the counter wraps around. – Ilmari Karonen Jun 11 '13 at 03:38
Oh, I see. Unfortunately the API doesn't have a seeding process; the seed and the state are the same. It's a really, *really* bad API, and that's why I'm so constrained. With that said, I think it might work to encrypt the input seed/state, use the obtained value as the result, increment that value by 1, then 'decrypt' it and store the result back to the seed/state. – R.. GitHub STOP HELPING ICE Jun 11 '13 at 03:50
You could also replace the counter in CTR mode with a "generalized counter", i.e. any full-period sequence of 32-bit values (such as a full period LCRNG). That way, you'd essentially be using the block cipher to strengthen a simpler full period RNG by encrypting its output. The underlying simple LCRNG should still be random enough to avoid problems with sequential seed values. – Ilmari Karonen Jun 11 '13 at 13:41
1

according to wikipedia XXTEA has a minimum block size of 64 bits. You might want to consider Skip32 instead. – CodesInChaos Jun 11 '13 at 17:42
The minimum block for XXTEA as written seems to be 64-bit (2 32-bit words). Would it make sense to just drop the algorithm to using a pair of 16-bit words, and adjust the key size and key schedule size accordingly? For cryptographic use obviously that would require a lot of analysis, but for PRNG use I think it would be "safe" and the number of rounds could be greatly reduced too, no? – R.. GitHub STOP HELPING ICE Jun 11 '13 at 18:09
@R: Yes, that could work. Obviously, you should run the resulting RNG through the usual battery of tests (DIEHARD, TestU01, etc.). Or try CodesInChaos's sugeestion of Skip32. – Ilmari Karonen Jun 11 '13 at 19:15
So far your answer seems the most theoretically-sound, but possibly impractical in terms of performance. I'm keeping all options open for now though. It's nice to have three essentially different answers that all look promising. – R.. GitHub STOP HELPING ICE Jun 12 '13 at 00:48
Hasty Pudding cypher has a very wide range of block sizes, including 32 bits. – rossum Jun 12 '13 at 12:27

score 3 · Answer 2 · answered Jun 11 '13 at 03:05

3

A 32-bit maximal-period Galois LFSR might work for you. Try:

r = (r >> 1) ^ (-(r & 1) & 0x80200003);

The one problem with LFSRs is that you can't produce the value 0. So this one has a range of 1 to 2^32-1. You may want to tweak the output or else stick with a good LCG.

answered Jun 11 '13 at 03:05

Lee Daniel Crocker

12,927
3
29
55

Perhaps `if (r==0xdeadbeef) r=0; else if (r==0) r=0xdeadbeef; r = (r >> 1) ^ (-(r & 1) & 0x80200003);` – R.. GitHub STOP HELPING ICE Jun 11 '13 at 03:14
Better not to futz with the state variable itself. Just use r-1 as your output value. That makes your range and period 0 to 2^32-2. – Lee Daniel Crocker Jun 11 '13 at 03:17
Yeah but I can't change the range. It's fixed. Thus I need to insert the 0 at some arbitrary point in the sequence. – R.. GitHub STOP HELPING ICE Jun 11 '13 at 03:24
I had a bit of a think about this approach with respect to Multiply With Carry (not an appropriate solution here) recently, as I wanted to join the two independent large orbits by switching tracks in between. I decided an ideal idiom was `if ((x ^ 0xdeadbeef) == 0 || x == 0) x = x ^ 0xdeadbeef;`, as this results in very compact code (compiled code, I mean) with only one conditional block. – sh1 Jun 12 '13 at 00:56
I'm not sure I understand what you're trying here. This is exactly the same as R's first comment ( `((x ^ z) == 0)` is the same as `(x == z)` ), and the LFSR is *never* 0. – Lee Daniel Crocker Jun 12 '13 at 01:02
3

It's zero if we make it zero. The trick is to hop in and out of the zero state once every 2^31-1 iterations. That way zero is added to the set of outputs, extending it to 2^32 exactly, and making it perfectly unbiased. 0xdeadbeef is just an arbitrary value, after which 0 is inserted into the stream (it doesn't matter if we run the generator, as zero goes to zero). Next time we see that the state was zero so we hop back into the stream the same place as we hopped out last time and carry on as if nothing happened. My suggested change is just meant to be computationally more efficient. – sh1 Jun 12 '13 at 01:32
1

Ah, I get it now. You're inserting a 0 into the sequence at an arbitrary point. Cool. Took me a while. – Lee Daniel Crocker Jun 12 '13 at 02:05
It gives you a 1 to 2^32-1 range, OP needs 0 to 2^31-1, so just output r>>1. Replacing an arbitrary value with 0 still gives you a hole in your distribution, that doesn't sound very clever. – Michel Rouzic Sep 29 '13 at 09:24
1

@MichelRouzic It doesn’t replace a value with zero, it inserts a zero in the sequence after that value, giving a sequence of length 2^32. It’s actually clever. – sam hocevar Sep 13 '16 at 00:30
Are you sure the maximum is 2^32? I seem to be getting about half that. – Ash Sep 03 '18 at 12:30
1

Ah never mind, I found my issue. It was using unsigned integers and it was messing up the negative portion of the algebra. – Ash Sep 03 '18 at 12:35

bryc · Answer 3 · 2023-04-07T08:54:19.593

In order of best to worst:

"SplitMix32", adopted from xxHash/MurmurHash3 (Weyl sequence):

uint32_t splitmix32(void) {
    uint32_t z = state += 0x9e3779b9;
    z ^= z >> 15; // 16 for murmur3
    z *= 0x85ebca6b;
    z ^= z >> 13;
    z *= 0xc2b2ae35;
    return z ^= z >> 16;
}

I'm unsure about the quality. But its 64-bit big brother has a lot of fans (passes BigCrush). So the general structure is worth looking at.

Update: Better constants have been found that potentially make this a very good PRNG:

uint32_t splitmix32(void) {
    uint32_t z = (state += 0x9e3779b9);
    z ^= z >> 16; z *= 0x21f0aaad;
    z ^= z >> 15; z *= 0x735a2d97;
    z ^= z >> 15;
    return z;
}

Mulberry32 which has a period of exactly 2³²:

uint32_t mulberry32(void) {
    uint32_t z = state += 0x6D2B79F5;
    z = (z ^ z >> 15) * (1 | z);
    z ^= z + (z ^ z >> 7) * (61 | z);
    return z ^ z >> 14;
}

It's quite good. Author states "It passes gjrand's 13 tests with no failures and a total P-value of 0.984 (where 1 is perfect and 0.1 or less is a failure) on 4GB of generated data. That's a quarter of the full period".

32-bit variants of Xorshift have a guaranteed period of 2³²−1 using a 32-bit state:

uint32_t state;

uint32_t xorshift32(void) {
    state ^= state << 13;
    state ^= state >> 17;
    state ^= state << 5;
    return state;
}

That's the original 32-bit recommendation from 2003 (see paper). Depending on your definition of "decent quality", that should be fine. However it fails the binary rank tests of Diehard, and 5/10 tests of SmallCrush. So there are definitely detectable quality issues. Still, it's a classic, and serves as inspiration for a lot of improved versions.

Alternate version with better mixing and constants (passes SmallCrush and Crush):

uint32_t xorshift32amx(void) {
    int s = __builtin_bswap32(state * 1597334677);
    state ^= state << 13;
    state ^= state >> 17;
    state ^= state << 5;
    return state + s;
}

Based on research here and here. I haven't tested it, but it looks reasonable. Still you'd be better off with splitmix32, as they're practically identical in performance, but this one's likely weaker.

Any thoughts on the PCG family? (E.g., the generator called "PCG RXS M XS 32 (LCG)" in the PCG paper.) — Mark Dickinson, Aug 28 '18 at 11:57
@MarkDickinson From what I've seen PCG relies on 64-bit math, and there are no fully 32-bit variants that I can find. I can't find any code specific to the generator you mentioned either. I'm sure PCG is a good generator, but I feel like its a poor choice for JavaScript or embedded systems. [xoshiro128**](http://xoshiro.di.unimi.it/xoshiro128starstar.c) for example is ideal in that regard, since it uses 32-bit integers _only_. — bryc, Aug 28 '18 at 13:06
Makes sense; thanks. Of course, you can emulate the required 64-bit arithmetic, but that's going to have implications for code complexity and efficiency. — Mark Dickinson, Aug 28 '18 at 14:24
The code for splitmix32 does not have a period of 2^32. It has a fixed point at 0xe1aff528! Also, mulberry32 does not have a period of 2^32 either (although at least it has no fixed points). — Ben Karel, Apr 25 '20 at 15:04
you sure about Mulberry32? Author claims *"It should have a period of 2 to the 32, exactly, since the state changes to be each 32-bit unsigned integer before cycling."*. what's your test code? — bryc, Apr 26 '20 at 02:13

score 2 · Answer 4 · edited May 23 '17 at 11:50

2

Elaborating on my comment...

A block cipher in counter mode gives a generator in approximately the following form (except using much bigger data types):

uint32_t state = 0;
uint32_t rand()
{
    state = next(state);
    return temper(state);
}

Since cryptographic security hasn't been specified (and in 32 bits it would be more or less futile), a simpler, ad-hoc tempering function should do the trick.

One approach is where the next() function is simple (eg., return state + 1;) and temper() compensates by being complex (as in the block cipher).

A more balanced approach is to implement LCG in next(), since we know that it also visits all possible states but in a random(ish) order, and to find an implementation of temper() which does just enough work to cover the remaining problems with LCG.

Mersenne Twister includes such a tempering function on its output. That might be suitable. Also, this question asks for operations which fulfill the requirement.

I have a favourite, which is to bit-reverse the word, and then multiply it by some constant (odd) number. That may be overly complex if bit-reverse isn't a native operation on your architecture.

edited May 23 '17 at 11:50

Community

1
1

answered Jun 11 '13 at 23:33

sh1

4,324
17
30

I haven't tried bit-reversal, but byte-swap seems promising so far. LFSR looks more promising as the 'next' function than LCG, since LCG has really bad period in its low bits, whereas LFSR has full period in all bits. – R.. GitHub STOP HELPING ICE Jun 12 '13 at 00:46
@R.., The idea is that `temper()` would spread things around so the bad-period bits are dispersed. LCG just has the simple advantage of being full-period without any extra complexity. But, in fact, if you did use multiplication in the tempering, it probably is for the best if you use a fundamentally different algorithm for `next()`. So maybe I agree with you. – sh1 Jun 12 '13 at 00:51
I just tried a naive LCG with the temper function copied from mersenne twister, and the diehard results are statistically indistinguishable from /dev/urandom. :-) – R.. GitHub STOP HELPING ICE Jun 12 '13 at 01:20
@R.., Then [test](http://www.iro.umontreal.ca/~simardr/testu01/tu01.html) [harder](http://www.phy.duke.edu/~rgb/General/dieharder.php)! – sh1 Jun 12 '13 at 01:35
@R.., did you try the harder tests? I don't expect they'll show up anything new, but it almost seems too easy; as if diehard has overlooked something. – sh1 Jun 27 '13 at 12:25
I tried dieharder, yes. No failures. I didn't try testu01 (too much work to setup and I didn't get around to it yet), but I did have someone on our team with PRNG experience do some other statistical tests with monte carlo methods, and the only tests it failed were the ones that provably must fail for periods this short. – R.. GitHub STOP HELPING ICE Jun 27 '13 at 15:20

Looking for decent-quality PRNG with only 32 bits of state

4 Answers4

Linked