0

I'd like to generate all possible combination (without repetitions) in bit representation. I can't use any library like boost or stl::next_combination - it has to be my own code (computation time is very important).

Here's my code (modified from ones StackOverflow user):

    int combination  = (1 << k) - 1;
    int new_combination = 0;
    int change = 0;

    while (true)
    {
        // return next combination
        cout << combination << endl;

        // find first index to update
        int indexToUpdate = k;
        while (indexToUpdate > 0 && GetBitPositionByNr(combination, indexToUpdate)>= n - k + indexToUpdate)
            indexToUpdate--;

        if (indexToUpdate == 1) change = 1; // move all bites to the left by one position
        if (indexToUpdate <= 0) break; // done

         // update combination indices
        new_combination = 0;
        for (int combIndex = GetBitPositionByNr(combination, indexToUpdate) - 1; indexToUpdate <= k; indexToUpdate++, combIndex++)
        {
            if(change)
            {
                new_combination |= (1 << (combIndex + 1));
            }
            else
            {
                combination = combination & (~(1 << combIndex));
                combination |= (1 << (combIndex + 1));
            }
        }
        if(change) combination = new_combination;
        change = 0;
    }

where n - all elements, k - number of elements in combination. GetBitPositionByNr - return position of k-th bit. GetBitPositionByNr(13,2) = 3 cause 13 is 1101 and second bit is on third position.

It gives me correct output for n=4, k=2 which is:

0011 (3 - decimal representation - printed value)
0101 (5)
1001 (9)
0110 (6)
1010 (10)
1100 (12)

Also it gives me correct output for k=1 and k=4, but gives me wrong outpu for k=3 which is:

0111 (7)
1011 (11)
1011 (9) - wrong, should be 13
1110 (14)

I guess the problem is in inner while condition (second) but I don't know how to fix this.

Maybe some of you know better (faster) algorithm to do want I want to achieve? It can't use additional memory (arrays).

Here is code to run on ideone: IDEONE

db_k
  • 364
  • 1
  • 5
  • 19
  • 3
    See the classic algorithm: https://graphics.stanford.edu/~seander/bithacks.html#NextBitPermutation; one must start with `(1 << n) - 1`, where `n` is the number of bits; Note that the recurrence ends with all ones. – Aki Suihkonen Jul 29 '15 at 08:22
  • Thanks a lot, that was one I was looking for. – db_k Jul 29 '15 at 09:21
  • @AkiSuihkonen Why a comment instead of answer?! – stgatilov Jul 29 '15 at 17:54
  • The Q wasn't that clear -- but it's been answered so many times here, that I don't think it's ok to get any more rep from new answers: http://stackoverflow.com/search?q=next+bit+permutation – Aki Suihkonen Jul 30 '15 at 05:39

1 Answers1

0

When in doubt, use brute force. Alas, generate all variations with repetition, then filter out the unnecessary patterns:

unsigned bit_count(unsigned n)
{
    unsigned i = 0;

    while (n) {
        i += n & 1;
        n >>= 1;
    }

    return i;
}

int main()
{
    std::vector<unsigned> combs;
    const unsigned N = 4;
    const unsigned K = 3;

    for (int i = 0; i < (1 << N); i++) {
        if (bit_count(i) == K) {
            combs.push_back(i);
        }
    }

    // and print 'combs' here
}

Edit: Someone else already pointed out a solution without filtering and brute force, but I'm still going to give you a few hints about this algorithm:

  • most compilers offer some sort of intrinsic population count function. I know of GCC and Clang which have __builtin_popcount(). Using this intrinsic function, I was able to double the speed of the code.

  • Since you seem to be working on GPUs, you can parallelize the code. I have done it using C++11's standard threading facilities, and I've managed to compute all 32-bit repetitions for arbitrarily-chosen popcounts 1, 16 and 19 in 7.1 seconds on my 8-core Intel machine.

Here's the final code I've written:

#include <vector>
#include <cstdio>
#include <thread>
#include <utility>
#include <future>


unsigned popcount_range(unsigned popcount, unsigned long min, unsigned long max)
{
    unsigned n = 0;

    for (unsigned long i = min; i < max; i++) {
        n += __builtin_popcount(i) == popcount;
    }

    return n;
}

int main()
{
    const unsigned N = 32;
    const unsigned K = 16;

    const unsigned N_cores = 8;
    const unsigned long Max = 1ul << N;
    const unsigned long N_per_core = Max / N_cores;

    std::vector<std::future<unsigned>> v;

    for (unsigned core = 0; core < N_cores; core++) {
        unsigned long core_min = N_per_core * core;
        unsigned long core_max = core_min + N_per_core;

        auto fut = std::async(
            std::launch::async,
            popcount_range,
            K,
            core_min,
            core_max
        );

        v.push_back(std::move(fut));
    }

    unsigned final_count = 0;
    for (auto &fut : v) {
        final_count += fut.get();
    }

    printf("%u\n", final_count);

    return 0;
}
  • The goal was performance, so brute force + filter is probably not the way to go. – nwp Jul 29 '15 at 08:51
  • 1
    @nwp sure, generating 16 numbers is a waste of time when you only need 4 of them, right? (my point is, the problem's complexity is combinatorial by nature. Surely you can save some complexity by coming up with an oh-so-clever algorithm, but I'll have to look at those pesky *real-world* constant factors.) – The Paramagnetic Croissant Jul 29 '15 at 08:56
  • In fact, it is clear way to compute, but unfortunately computation time is VERY important to me because I have first for loop `i=0 to 2^n` then nested second for loop `j=0 to 2^i` and then nested checking combinations so it's not just a few more unneeded computation - it's a huge waste of time. – db_k Jul 29 '15 at 09:09
  • By the way, here is bit counting with O(1) time complexity: `int uCount = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111); uCount = ((uCount + (uCount >> 3)) & 030707070707) % 63;` – db_k Jul 29 '15 at 09:10
  • 1
    @KamilZ my `bit_count()` function is also `O(1)`, since `unsigned` is a fixed-width data type. But one can use `__builtin_popcount(n)` if one is using GCC or Clang, that's faster (about twice, just benchmarked). – The Paramagnetic Croissant Jul 29 '15 at 09:12
  • @KamilZ so, are you generating these combinations in the middle of a doubly-nested tight loop? As in `for (first loop) { for (second loop) { generate_combinations(); } }`? – The Paramagnetic Croissant Jul 29 '15 at 09:13
  • Yes, exactly. I know that complexity is bad (O(3^n)) but it has to be in my code. I'm doing computation on GPU using CUDA to count chromatic number. – db_k Jul 29 '15 at 09:19
  • @KamilZ well I'm sorry to say, but then you're screwed anyway with regards to running time… – The Paramagnetic Croissant Jul 29 '15 at 09:21
  • Finding Chromatic number is in general complex problem. I have a very fast algorithm (O(2^n)) time complexity, but unfortunately it has O(2^n) memory coplexity. Now I'm going to write code with worse time complexity O(3^n) but without using any memory! Also parallelization for O(3^n) can give very good results. – db_k Jul 29 '15 at 09:26
  • Check this paper if you want to see what I am doing: [link](http://www.cs.helsinki.fi/u/mkhkoivi/publications/sicomp-200Y.pdf) – db_k Jul 29 '15 at 09:29
  • @KamilZ Thanks. In the meantime, I've updated my code with two ideas you may find useful anyway. – The Paramagnetic Croissant Jul 29 '15 at 09:37
  • Thanks, I'll check it out. – db_k Jul 29 '15 at 09:46