14

I have a combinatorics problem for which I want to be able to pick an integer at random between 0 and a big integer.


Inadequacies of my current approach

Now for regular integers I would usually write something like int rand 500; and be done with it.

But for big integers, it looks like rand isn't meant for this.

Using the following code, I ran a simulation of 2 million calls to rand $bigint:

$ perl -Mbigint -E 'say int rand 1230138339199329632554990773929330319360000000 for 1 .. 2e6' > rand.txt

The distribution of the resultant set is far from desirable:

  • 0 (56 counts)
  • magnitude 1e+040 (112 counts)
  • magnitude 1e+041 (1411 counts)
  • magnitude 1e+042 (14496 counts)
  • magnitude 1e+043 (146324 counts)
  • magnitude 1e+044 (1463824 counts)
  • magnitude 1e+045 (373777 counts)

So the process was never able to choose a number like 999, or 5e+020, which makes this approach unsuitable for what I want to do.

It looks like this has something to do with the arbitrary precision of rand, which never goes beyond 15 digits in the course of my testing:

$ perl -E 'printf "%.66g", rand'
0.307037353515625

How can I overcome this limitation?

My initial thought is that maybe there is a way to influence the precision of rand, but it feels like a band-aid to a much bigger problem (i.e. the inability of rand to handle big integers).

In any case, I'm hoping someone has walked down this path before and knows how to remedy the situation.

Community
  • 1
  • 1
Zaid
  • 36,680
  • 16
  • 86
  • 155
  • 1
    Is your environment unix-ish where you have `/dev/random` and `/dev/urandom` as resources? – infixed Oct 10 '16 at 12:57
  • @infixed it's Windows, so no (I have a habit of replacing the `-E ""` with `-E ''` before posting up on SO) – Zaid Oct 10 '16 at 12:58
  • [Math::Random](http://p3rl.org/Math::Random)? Also [rand() precision low, looking for a fast way to get high precision rand float?](http://www.perlmonks.org/?node_id=1001701) – choroba Oct 10 '16 at 12:58
  • What happend to the other 270 rolls? – simbabque Oct 10 '16 at 13:00
  • 1
    @simbabque typo on my part... should've been 1411 not 1141 – Zaid Oct 10 '16 at 13:02
  • @choroba For `&random_uniform_integer`, ["the resulting range ($high - $low) must not be greater than 2147483561"](http://search.cpan.org/~grommel/Math-Random-0.71/Random.pm), so this doesn't seem suitable – Zaid Oct 10 '16 at 13:05
  • You can concentrate strings made from multiple pseudo-random numbers into one long string to pass to `Math::BigInt->new($str)`. But I doubt its cryptographically strong, and guessing if it'd be good enough for monte-carlo work is above my pay grade – infixed Oct 10 '16 at 13:15
  • 1
    A more theoretical-driven, but *maybe too slow* method: calc the number of bits you need for your number (easy). Then check how many bits your PRNG returns; probably 32/64. Sample ceil(needed_bits / rng_bits) time and concatenate the bits. If the resulting value is within your range: accept; else: resample again (all components!; acceptance-sampling). – sascha Oct 10 '16 at 13:43
  • @sascha since I only need integers that sounds like a start to a promising approach. Do you want to put that down as an answer? – Zaid Oct 10 '16 at 14:41
  • What version of `perl` are you using on Windows? Are you aware of [Perl 5.20.0 brings a "better" PRNG to Windows](https://www.nu42.com/2014/05/perl-5200-brings-better-prng-to-windows.html). Regardless, you might want to use [SFMT](http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/VERSIONS/C-LANG/c-lang.html) to generate a bunch of 32 or 64 bit random integers and construct larger ones from those. – Sinan Ünür Oct 10 '16 at 23:18

3 Answers3

5

(Converted from my comment)

A more theoretical-driven approach would be using multiple calls to the PRNG to create enough random-bits for your number to sample. Care has to be taken, if the number of bits produced by some PRNG is not equal to the number of bits needed as outlined below!

Pseudocode

  • Calculate the bits needed to represent your number: n_needed_bits
  • Check the size of bits returned by your PRNG: n_bits_prng
  • Calculate the number of samples needed: needed_prng_samples = ceil(n_needed_bits / n_bits_prng)
  • While true:
    • Sample needed_prng_samples (calls to PRNG) times & concatenate all the bits obtained
    • Check if the resulting number is within your range
    • Yes?: return number (finished)
    • No?: do nothing (loop continues; will resample all components again!)

Remarks

  • This is a form of acceptance-sampling / rejection-sampling
  • The approach is a Las-vegas type of algorithm: the runtime is not bounded in theory
    • The number of loops needed is in average: n_possible-sample-numbers-of-full-concatenation / n_possible-sample-numbers-within-range
  • The complete resampling (if result not within range) according to the rejection-method is giving access to more formal-analysis of non-bias / uniformity and is a very important aspect for this approach
  • Of course the classic assumptions in regards to PRNG-output are needed to make this work.
    • If the PRNG for example has some non-uniformity in regards to low-bits / high-bits (as often mentioned), this will have an effect of the output above
sascha
  • 32,238
  • 6
  • 68
  • 110
3

I was looking at this problem from the wrong angle

The bins are not the same size. Each bin is 10 times the size of the previous one. To put this in perspective, there are 10,000 possible integers at magnitude 1e+44 for every integer with magnitude 1e+40.

The probability of finding any number with magnitude 1e+20 for the bigint at 1e+45 is less than 0.00000 00000 00000 00000 001 %.

Forget needles in haystacks, this is more like finding a needle in a quasar!

Zaid
  • 36,680
  • 16
  • 86
  • 155
  • Maybe you can rephrase it then ... In any case, just a heads up, I believe you are going down the wrong path. Does `Math::BigInt` replace `rand`? Also, which `perl` version on Windows? – Sinan Ünür Oct 12 '16 at 13:44
1

An approach can be to cut string representation of the number into chunks, a boolean ($low) initialized is false while first random draws are equal to upper bound.

EDIT: added some explanations following comment

# first argument (in) upper bound
# second argument (in/out) is lower (false while random returns upper bound, after it remains true)
sub randhlp {
    my($upp)=@_;
    my $l=length $upp;
    # random number less than
    # - upper bound if islower is false
    # - 9..99 otherwise
    my $x=int rand ($_[1] ? 10**$l : $upp+1);
    if ($x<$upp) {
        $_[1]=1;
    }
    # left padding with 0
    return sprintf("%0*d",$l,$x);
}

# returns a random number less than argument (numeric string)
sub randistr {
    my($n)=@_;
    $n=~/^\d+$/ or die "invalid input not numeric";
    $n ne "0" or die "invalid input 0";
    my($low,$x);
    do {
        undef $x;
        # split string by chunks of 6 characters
        # except last chunk which has 1 to 6 characters
        while ($n=~/.{1,6}/g) {
            # concatenate random results
            $x.=randhlp($&,$low)
        }
    } while ($x eq $n);
    $x=~s/^0+//;
    return $x;
}

The test

for ($i=0;$i<2e6;++$i) {
    $H{length(randistr("1230138339199329632554990773929330319360000000"))}+=1;
}

print "$_ $H{$_}\n" for sort keys %H;

Returns

39 4
40 61
41 153
42 1376
43 14592
44 146109
45 1463301
46 374404
Nahuel Fouilleul
  • 18,726
  • 2
  • 31
  • 36
  • 2
    The algorithm in sensible, but could you un-golf the code? And remove `$low` which is never true, and the associated logic? – hobbs Oct 10 '16 at 19:28
  • $low becomes true as it is passed by reference in @_ array; the logic on an example if input is "12301383", it is split into "123013" and "83" the first random number must be between 000000 and 123013 included, if it is less than 123013, the next number can be any number betwen 00 and 99, else if is equal to 123013 the next number must be between 00 and 83 (except that this is the last chunk so it cannot be equal to 83) – Nahuel Fouilleul Oct 11 '16 at 07:56