0

Since this is about remapping a uniform distribution to another with a different range, this is not a PHP question specifically although I am using PHP.

I have a cryptographicaly secure random number generator that gives me evenly distributed integers (uniform discrete distribution) between 0 and PHP_INT_MAX.

How do I remap these results to fit into a different range in an efficient manner?

Currently I am using $mappedRandomNumber = $randomNumber % ($range + 1) + $min where $range = $max - $min, but that obvioulsy doesn't work since the first PHP_INT_MAX%$range integers from the range have a higher chance to be picked, breaking the uniformity of the distribution.

NotGaeL
  • 8,344
  • 5
  • 40
  • 70

3 Answers3

0

Well, having zero knowledge of PHP definitely qualifies me as an expert, so

mentally converting to float U[0,1)

f = r / PHP_MAX_INT

then doing

mapped = min + f*(max - min)

going back to integers

mapped = min + (r * max - r * min)/PHP_MAX_INT

if computation is done via 64bit math, and PHP_MAX_INT being 2^31 it should work

Severin Pappadeux
  • 18,636
  • 3
  • 38
  • 64
  • Thanks for the answer but I am afraid anything involving floats will result in trouble due to floating point representation issues and rounding errors. Also, using 64 bit math implies PHP_INT_MAX = 2^63 - 1, so there's also the overflow issue. – NotGaeL Mar 28 '15 at 15:33
  • No, wrt float i specifically said `mentally`, final expression is all integer. Wrt `Also, using 64 bit math implies PHP_INT_MAX = 2^63 - 1, so there's also the overflow issue`, could you use 64bit math only for intermediates? Again, I have no clue about PHP, but doing only computation in braces with 64bit math and then division should work. – Severin Pappadeux Mar 29 '15 at 02:28
  • yeah but you can see (max-min)/PHP_INT_MAX will always be a float and there is no other way to do the calculation without overflow (doing `r * (max - min)` first will overflow in all but trivial cases) – NotGaeL Mar 30 '15 at 17:59
0

This is what I ended up doing. PRNG 101 (if it does not fit, ignore and generate again). Not very sophisticated, but simple:

public function rand($min = 0, $max = null){

  // pow(2,$numBits-1) calculated as (pow(2,$numBits-2)-1) + pow(2,$numBits-2) 
  // to avoid overflow when $numBits is the number of bits of PHP_INT_MAX
  $maxSafe = (int) floor(
    ((pow(2,8*$this->intByteCount-2)-1) + pow(2,8*$this->intByteCount-2))   
    / 
    ($max - $min)
  ) * ($max - $min);

  // discards anything above the last interval N * {0 .. max - min -1} 
  // that fits in {0 ..  2^(intBitCount-1)-1}
  do {
    $chars = $this->getRandomBytesString($this->intByteCount);
    $n = 0;
    for ($i=0;$i<$this->intByteCount;$i++) {$n|=(ord($chars[$i])<<(8*($this->intByteCount-$i-1)));}
  } while (abs($n)>$maxSafe);

  return (abs($n)%($max-$min+1))+$min;

}

Any improvements are welcomed.

(Full code on https://github.com/elcodedocle/cryptosecureprng/blob/master/CryptoSecurePRNG.php)

NotGaeL
  • 8,344
  • 5
  • 40
  • 70
0

Here is the sketch how I would do it:

Consider you have uniform random integer distribution in range [A, B) that's what your random number generator provide. Let L = B - A. Let P be the highest power of 2 such that P <= L. Let X be a sample from this range. First calculate Y = X - A. If Y >= P, discard it and start with new X until you get an Y that fits.

Now Y contains log2(P) uniformly random bits - zero extend it up to log2(P) bits.

Now we have uniform random bit generator that can be used to provide arbitrary number of random bits as needed.

To generate a number in the target range, let [A_t, B_t) be the target range. Let L_t = B_t - A_t. Let P_t be the smallest power of 2 such that P_t >= L_t. Read log2(P_t) random bits and make an integer from it, let's call it X_t. If X_t >= L_t, discard it and try again until you get a number that fits. Your random number in the desired range will be L_t + A_t.

Implementation considerations: if your L_t and L are powers of 2, you never have to discard anything. If not, then even in the worst case you should get the right number in less than 2 trials on average.

Calmarius
  • 18,570
  • 18
  • 110
  • 157
  • That's smart, intuitive and effective. Also the first solution that came to my mind 4.5 years ago. If you check my code you'll see that I went one step further and removed the requirement for a log2 / random bit generator. Still not very cost effective: both solutions end up discarding up to half of the generations, but that's what remained on my production code and so far I got no performance complaints... – NotGaeL Oct 22 '19 at 16:26