Specific binary permutation generating function

Question

So I'm writing a program where I need to produce strings of binary numbers that are not only a specific length, but also have a specific number of 1's and 0's. In addition, theses strings that are produced are compared to a higher and lower value to see if they are in that specific range. The issue that I'm having is that I'm dealing with 64 bit unsigned integers. So sometimes, very large numbers that require al 64 bits produce a lot of permutations of binary strings for values which are not in the range at all and it's taking a ton of time.

I'm curious if it is possible for an algorithm to take in two bound values, a number of ones, and only produce binary strings in between the bound values with that specific number of ones.

This is what I have so far, but it's producing way to many numbers.

void generatePermutations(int no_ones, int length, uint64_t smaller, uint64_t larger, uint64_t& accum){

    char charArray[length+1];

    for(int i = length - 1; i > -1; i--){
        if(no_ones > 0){
            charArray[i] = '1';
            no_ones--;
        }else{
            charArray[i] = '0';
        }
    }
    charArray[length] = '\0';

    do {
        std::string val(charArray);
        uint64_t num = convertToNum(val);
        if(num >= smaller && num <= larger){
            accum ++;
        }
    } while ( std::next_permutation(charArray, (charArray + length)));

}

Technically that's not valid C++ as it doesn't have [variable-length arrays](https://en.wikipedia.org/wiki/Variable-length_array). If you want to be portable you should use `std::vector` instead. Or why not use `std::string` if you want to create a string? — Some programmer dude, Jan 31 '16 at 15:23
This question is relevant: http://stackoverflow.com/questions/30947343/calculating-the-next-higher-number-which-has-same-number-of-set-bits/30951360#30951360. You also need to find the first number >= lower limit with the correct number of bits, which can be done by flipping bits so it is O(wordsize) — rici, Jan 31 '16 at 18:19
Thanks @rici, do you have any recommendations for how I tackle the bit flipping part? I haven't been able to find any posts quite related to what I'm trying to do. — Phil_B, Jan 31 '16 at 19:00

rici · Accepted Answer · 2016-02-01T01:42:29.443

(Note: The number of 1-bits in a binary value is generally called the population count -- popcount, for short -- or Hamming weight.)

There is a well-known bit-hack to cycle through all binary words with the same population count, which basically does the following:

Find the longest suffix of the word consisting of a 0, a non-empty sequence of 1s, and finally a possibly empty sequence of 0s.
Change the first 0 to a 1; the following 1 to a 0, and then shift all the others 1s (if any) to the end of the word.

Example:

00010010111100
       ^-------- beginning of the suffix
00010011         0 becomes 1
        0        1 becomes 0
         00111   remaining 1s right-shifted to the end

That can be done quite rapidly by using the fact that the lowest-order set bit in x is x & -x (where - represents the 2s-complement negative of x). To find the beginning of the suffix, it suffices to add the lowest-order set bit to the number, and then find the new lowest-order set bit. (Try this with a few numbers and you should see how it works.)

The biggest problem is performing the right shift, since we don't actually know the bit count. The traditional solution is to do the right-shift with a division (by the original low-order 1 bit), but it turns out that divide on modern hardware is really slow, relative to other operands. Looping a one-bit shift is generally faster than dividing, but in the code below I use gcc's __builtin_ffsll, which normally compiles into an appropriate opcode if one exists on the target hardware. (See man ffs for details; I use the builtin to avoid feature-test macros, but it's a bit ugly and limits the range of compilers you can use. OTOH, ffsll is also an extension.)

I've included the division-based solution as well for portability; however, it takes almost three times as long on my i5 laptop.

template<typename UInt>
static inline UInt last_one(UInt ui) { return ui & -ui; }

// next_with_same_popcount(ui) finds the next larger integer with the same
// number of 1-bits as ui. If there isn't one (within the range
// of the unsigned type), it returns 0.
template<typename UInt>
UInt next_with_same_popcount(UInt ui) {
  UInt lo = last_one(ui);
  UInt next = ui + lo;
  UInt hi = last_one(next);
  if (next) next += (hi >> __builtin_ffsll(lo)) - 1;
  return next;
}

/*
template<typename UInt>
UInt next_with_same_popcount(UInt ui) {
  UInt lo = last_one(ui);
  UInt next = ui + lo;
  UInt hi = last_one(next) >> 1;
  if (next) next += hi/lo - 1;
  return next;
}
*/

The only remaining problem is to find the first number with the correct popcount inside of the given range. To help with this, the following simple algorithm can be used:

Start with the first value in the range.
As long as the popcount of the value is too high, eliminate the last run of 1s by adding the low-order 1 bit to the number (using exactly the same x&-x trick as above). Since this works right-to-left, it cannot loop more than 64 times, once per bit.
While the popcount is too small, add the smallest possible bit by changing the low-order 0 bit to a 1. Since this adds a single 1-bit on each loop, it also cannot loop more than k times (where k is the target popcount), and it is not necessary to recompute the population count on each loop, unlike the first step.

In the following implementation, I again use a GCC builtin, __builtin_popcountll. This one doesn't have a corresponding Posix function. See the Wikipedia page for alternative implementations and a list of hardware which does support the operation. Note that it is possible that the value found will exceed the end of the range; also, the function might return a value less than the supplied argument, indicating that there is no appropriate value. So you need to check that the result is inside the desired range before using it.

// next_with_popcount_k returns the smallest integer >= ui whose popcnt
// is exactly k. If ui has exactly k bits set, it is returned. If there
// is no such value, returns the smallest integer with exactly k bits.
template<typename UInt>
UInt next_with_popcount_k(UInt ui, int k) {
  int count; 
  while ((count = __builtin_popcountll(ui)) > k)
    ui += last_one(ui);
  for (int i = count; i < k; ++i)
    ui += last_one(~ui);
  return ui;
}

It's possible to make this slightly more efficient by changing the first loop to:

while ((count = __builtin_popcountll(ui)) > k) {
  UInt lo = last_one(ui);
  ui += last_one(ui - lo) - lo;
}

That shaved about 10% off of the execution time, but I doubt whether the function will be called often enough to make that worthwhile. Depending on how efficiently your CPU implements the POPCOUNT opcode, it might be faster to do the first loop with a single bit sweep in order to be able to track the popcount instead of recomputing it. That will almost certainly be the case on hardware without a POPCOUNT opcode.

Once you have those two functions, iterating over all k-bit values in a range becomes trivial:

void all_k_bits(uint64_t lo, uint64_t hi, int k) {
  uint64_t i = next_with_popcount_k(lo, k);
  if (i >= lo) {
    for (; i > 0 && i < hi; i = next_with_same_popcount(i)) {
      // Do what needs to be done
    }
  }
}

Just wanna make sure I'm understanding you correctly. So if there aren't enough ones in the binary representation then add the number of ones that I need to the right most positions with zeros in them. And if there are too many ones just add 1 to the number itself and see how many ones I have in that binary representation? — Phil_B, Jan 31 '16 at 19:23

Specific binary permutation generating function

1 Answers1

Linked