Generating evenly distributed bits, using approximation

Question

I'm trying to generate 0 or 1 with 50/50 chance of any using random.uniform instead of random.getrandbits.

Here's what I have

0 if random.uniform(0, 1e-323) == 0.0 else 1

But if I run this long enough, the average is ~70% to generate 1. As seem here:

sum(0 if random.uniform(0, 1e-323) == 0.0 
    else 1 
    for _ in xrange(1000)) / 1000.0  # --> 0.737

If I change it to 1e-324 it will always be 0. And if I change it to 1e-322, the average will be ~%90.

I made a dirty program that will try to find the sweet spot between 1e-322 and 1e-324, by dividing and multiplying it several times:

v = 1e-323
n_runs = 100000
target = n_runs/2

result = 0
while True:
    result = sum(0 if random.uniform(0, v) == 0.0 else 1 for _ in xrange(n_runs))

    if result > target:
        v /= 1.5
    elif result < target:
        v *= 1.5 / 1.4
    else:
        break

print v

This end ups with 4.94065645841e-324

But it still will be wrong if I ran it enough times.

Is there I way to find this number without the dirty script I wrote? I know that Python has a intern min float value, show in sys.float_info.min, which in my PC is 2.22507385851e-308. But I don't see how to use it to solve this problem.

Sorry if this feels more like a puzzle than a proper question, but I'm not able to answer it myself.

What's wrong with `getrandbits`? Or with any other function more appropriate than `uniform`? — Stefan Pochmann, Jun 06 '15 at 15:11
There's nothing 'wrong', It's a puzzle. The goal was to find the lowest value possible that when passed to random.uniform(0, x) it would generate either 0 or some value with 50% chance and no in between numbers. — f.rodrigues, Jun 06 '15 at 15:34
Ah, ok, didn't sound like a puzzle. Btw, you can also use `int(random.uniform(0, v) > 0)`. — Stefan Pochmann, Jun 06 '15 at 15:44

Pascal Cuoq · Accepted Answer · 2015-06-06T20:10:13.470

3

I know that Python has a intern min float value, show in sys.float_info.min, which in my PC is 2.22507385851e-308. But I don't see how to use it to solve this problem.

2.22507385851e-308 is not the smallest positive float value, it is the smallest positive normalized float value. The smallest positive float value is 2^-52 times that, that is, near 5e-324.

2^-52 is called the “machine epsilon” and it is usual to call the “min” of a floating-point type a value that is nether that which is least of all comparable values (that is -inf), nor the least of finite values (that is -max), nor the least of positive values.

Then, the next problem you face is that random.uniform is not uniform to that level. It probably works ok when you pass it a normalized number, but if you pass it the smallest positive representable float number, the computation it does with it internally may be very approximative and lead it to behave differently than the documentation says. Although it appears to work surprisingly ok according to the results of your “dirty script”.

edited Jun 06 '15 at 20:10

answered Jun 06 '15 at 14:16

Pascal Cuoq

79,187
7
161
281

Upvoted for the "normalized" stuff. But why would `uniform` not be uniform to that level? All it does is `a + (b-a) * self.random()` (at least the cpython one) which boilds down to `b * self.random()` here. Shouldn't that still produce the closest representable number and thus be fine? – Stefan Pochmann Jun 06 '15 at 15:03
@StefanPochmann I didn't know how `uniform` is implemented, but it's complicated to make one that has all the properties that one might expect. A `a + (b-a) * self.random()` implementation only produces values that are multiple of min(ulp(a), ulp(b-a)). If 0 is in the interval [a, b], that means that a lot of possible floating-point values are never produced. But as you say, `a + (b-a) * self.random()` is pretty robust for most values of `a` and `b` (as long as `a-b` doesn't overflow because of a very negative `a` and a very positive `b`) – Pascal Cuoq Jun 06 '15 at 16:28
Nitpick: the smallest +ve float value is 2^(-52) times the smallest normalized positive float, not 2^(-53) times. – Mark Dickinson Jun 06 '15 at 17:23
@MarkDickinson Fixed. Fortunately according to wikipedia 2^-52 and 2^-53 are both called “machine epsilon”. My mistake was using Google's calculator to check which was right. – Pascal Cuoq Jun 06 '15 at 20:11

Marcus Vinícius Monteiro · Answer 2 · 2015-06-06T17:24:05.590

2

Here's the random.uniform implementation, according to the source:

from os import urandom as _urandom

BPF = 53        # Number of bits in a float
RECIP_BPF = 2**-BPF

def uniform(self, a, b):
    "Get a random number in the range [a, b) or [a, b] depending on rounding."
     return a + (b-a) * self.random()

def random(self):
     """Get the next random number in the range [0.0, 1.0)."""
     return (int.from_bytes(_urandom(7), 'big') >> 3) * RECIP_BPF

So, your problem boils down to finding a number b that will give 0 when multiplied by a number less than 0.5 and another result when multiplied by a number larger than 0.5. I've found out that, on my machine, that number is 5e-324.

To test it, I've made the following script:

from random import uniform

def test():
    runs = 1000000
    results = [0, 0]
    for i in range(runs):
        if uniform(0, 5e-324) == 0:
            results[0] += 1
        else:
            results[1] += 1
    print(results)

Which returned results consistent with a 50% probability:

>>> test()
[499982, 500018]
>>> test()
[499528, 500472]
>>> test()
[500307, 499693]

edited Jun 06 '15 at 17:24

answered Jun 06 '15 at 14:45

Marcus Vinícius Monteiro

7,669
5
30
39

1

+1. `5e-324` is stored as 2^(-1074), and the OP's method will give 0 if and only if the result of `random() * 2^(-1074)` (before rounding) is less than or equal to `2^(-1075)`, which happens if and only if the result of `random()` is less than or equal to `0.5`. Technically that means that (assuming that the underlying Mersenne Twister generator is perfect) a result of `0` will occur with probability `0.5 + 2^(-53)`, and a result of `1` with probability `0.5 - 2^(-53)`, so it's not exactly a 50/50 chance (but you'd have a *very* hard time detecting the difference using any statistical test). – Mark Dickinson Jun 06 '15 at 17:31
@MarkDickinson Nice! Just as a conversation: [`urandom`](https://docs.python.org/3/library/os.html) queries `/dev/random` on linux for the string of random bytes. The program underlying `/dev/random` is this: [`random.c`](https://github.com/torvalds/linux/blob/4f671fe2f9523a1ea206f63fe60a7c7b3a56d5c7/drivers/char/random.c). Does `random.c` implement a Mersenne Twister? Does it impact the probabilities? – Marcus Vinícius Monteiro Jun 06 '15 at 18:01
1

No idea about the Linux `random.c`, but Python's `random` module does use Mersenne Twister by default. (Though it also provides classes based on `/dev/urandom`.) – Mark Dickinson Jun 06 '15 at 19:26

Generating evenly distributed bits, using approximation

2 Answers2