7

I'm looking for a highly efficient way to generate random std::bitset of set length. I'd also like to be able to influence the probability of 1s appearing in the result, so if the probability value is set low enough, only a small percentage of all the results will even contain a 1, but it's still possible (but very unlikely) to result in all 1s. It's going to be used in a very computation-heavy application, so every possible optimization is welcome.

4pie0
  • 29,204
  • 9
  • 82
  • 118
Kuba Orlik
  • 3,360
  • 6
  • 34
  • 49
  • 1
    You might want to look into the new [pseudo-random capabilites in C++11](http://en.cppreference.com/w/cpp/numeric/random). Perhaps create your own distribution if none of the standard ones fits your requirements. – Some programmer dude Aug 07 '14 at 07:23
  • cpu "tick tock" register counter. just query and get a random odd or even number – Tuğrul Aug 07 '14 at 07:37
  • You can use some of the techniques from this java question http://stackoverflow.com/questions/2075912/generate-a-random-binary-number-with-a-variable-proportion-of-1-bits/ – Michael Anderson Aug 07 '14 at 08:36
  • You could first call use the [Box-Muller transform](http://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform) which calls `rand()` on average once per random number and generates gausian-distributed random numbers to determine how many bits will be set in your result. You then track total bits remaining, and bits to set (to 1) remaining, iteratively. On each iteration you call `rand()` (or your PRNG of choice) and `mod` it by the total bits that remain. If the value is less than the bits yet to set (as 1s) then you set the bit, otherwise you leave it clear (0). – Apriori Aug 07 '14 at 15:18
  • To optimize this you could try to skim multiple random numbers from the result of your PRNG, and/or subdivide the range into independent pieces, maybe have separate pools for `toset` and `totalremaining`, one for each bit in a byte or so. Then compute 8 weighted random bits at once and distribute them into the byte, or something similar. Or that is one idea that occurs to me anyway. – Apriori Aug 07 '14 at 15:24
  • The most optimal method while likely depend on the density of zeros. If the density of zeros is very high or low then it will be easier to optimize than distributions which are closer to 50% zeros. – Z boson Aug 08 '14 at 07:47

1 Answers1

8

Bernoulli distribution is a probability distribution of 1 or 0 in a single experiment. A sum of many such distributed variables

enter image description here

gives a variable distributed with mean n*p (binomial distribution). So by taking n bernoulli distributed bits with probability of 1 given by p we get a bitset of size n and np bits set to 1 on average. Of course this is just a starting point to optimize next if the efficiency this offers is not enough.

#include <iostream>
#include <random>
#include <bitset>

template< size_t size>
typename std::bitset<size> random_bitset( double p = 0.5) {

    typename std::bitset<size> bits;
    std::random_device rd;
    std::mt19937 gen( rd());
    std::bernoulli_distribution d( p);

    for( int n = 0; n < size; ++n) {
        bits[ n] = d( gen);
    }

    return bits;
}

int main()
{
    for( int n = 0; n < 10; ++n) {
        std::cout << random_bitset<10>( 0.25) << std::endl;
    }
}

result:

1010101001

0001000000

1000000000

0110010000

1000000000

0000110100

0001000000

0000000000

1000010000

0101010000

http://ideone.com/p29Pbz

4pie0
  • 29,204
  • 9
  • 82
  • 118
  • The default C++ Mersenne Twister uses an internal state of 624 bytes. The way you seed it provides 4 bytes of entropy, so you can only access 2^32 of the 2^4992 sequences. This makes it easier to predict the sequence. More problematically, some numbers (1,580,024,992 of them) will never appear at all as the first number. ([Source](http://www.pcg-random.org/posts/cpp-seeding-surprises.html)). The bottom line, is that you need as many bytes of entropy as you have state to correctly seed. – Richard Aug 12 '16 at 15:43
  • @Richard This doesn't mean that solution presented doesn't work as you suggested in your comment. It still works giving enough accuracy in almost all practical use-cases. Your comment should be "To improve entropy seed the mt this way..." – 4pie0 Aug 12 '16 at 15:55