1

Given two binary arrays, how can one quickly find the number of 1's occurring between intersecting bits for both arrays?

For example, consider a pair of bitsets (i1, i2) and their intersection (i12):

i1  = [0001001000100100];
i2  = [0101000100010110];
i12 = [0001000000000100]; // (i1 & i2)

The intersecting bits are in position 3 and 13, so count the number of 1's in positions 0-2 and 4-12:

x1 = [0, 2] // results i1
x2 = [1, 2] // results for i2

Now generalize this idea to 32-bit integers. Toy example:

int i1[2] = {2719269390, 302235938};
int i2[2] = {2315436042, 570885266};

Toy example in binary:

i1      = [10100010 00010100 11000010 00001110, 00010010 00000011 11000001 00100010]
i2      = [10001010 00000010 11000000 00001010, 00100010 00000111 00000100 10010010]
i1 & i2 = [10000010 00000000 11000000 00001010, 00000010 00000011 00000000 00000010]

Expected results:

x1 = [2, 2, 0, 1, 1, 1, 0, 0, 4];
x2 = [2, 1, 0, 0, 0, 1, 1, 0, 3];

I can see a "brute-force" approach using __builtin_clz() to determine leading zeros in i1 & i2, shifting i1 and i2 right by that number, doing __builtin_popcount() for the shifted i1 and i2 results, and repeating the procedure until no intersections remain. My instinct suggests there may be a more elegant approach, as this involves a few temporaries, many instructions, and at least two logical branches.

C/C++ implementations are welcome, but I would be satisfied enough with conceptual perspectives. Suffice it to say this approach intends to remove a critical bottleneck in a popular program.

zdebruine
  • 3,687
  • 6
  • 31
  • 50
  • There is `std::bitset::count`: https://en.cppreference.com/w/cpp/utility/bitset/count – Rulle Dec 01 '22 at 13:45
  • @Rulle indeed. https://stackoverflow.com/questions/60165922/stdbitsetncount-vs-builtin-popcount. It's a tiny piece to the puzzle. – zdebruine Dec 01 '22 at 13:49
  • Your generalisation to 32 bit integers is ambiguous. You have two values, so are they high an low of a 64bit integer? If so, you need to specify if it's big or little endian because I think it matters in this problem. Maybe you could re-ask the question with your generalisation to 64bit integers to avoid the superfluous complexity. Are you hoping someone will write the code for you? Try LinkedIn. – Simon Goater Dec 01 '22 at 14:04
  • @SimonGoater good point about endianness, assume little endian. – zdebruine Dec 01 '22 at 14:20

4 Answers4

1

std::popcount returns the number of active bits in a value. It is able to call intrinsic functions, if available.

constexpr auto get_popcount(auto list) {
    decltype(list) result;
    std::transform(list.begin(), list.end(), result.begin(), [](auto x) {
        return std::popcount(x);
    });
    return result;
}

int main() {
    constexpr auto i12 = std::array{
        0b10000010u, 0b00000000u, 0b11000000u, 0b00001010u, 
        0b00000010u, 0b00000011u, 0b00000000u, 0b00000010u};

    static_assert(get_popcount(i12) == std::array{ 2u, 0u, 2u, 2u, 1u, 2u, 0u, 1u });
}
Stack Danny
  • 7,754
  • 2
  • 26
  • 55
1

Can't test this right now but (I1 XOR I2) AND I1 should leave only the bits that you want to count, so:

constexpr auto get_popcount(auto i1, auto i2) {
    std::vector<std::pair(int,int)> res;
    std::transform (i1.begin(), i1.end(), i2.begin(), res.begin(), []
        (auto x, auto y){
            return std::tie(std::popcount((x^y)&x),std::popcount((x^y)&y));
    });
    return res;
}

Where the first item of the touple is the number of bits on I1 and the second on I2.

  • This counts the number of non-intersecting bits in each `int`, not the number of non-intersecting bits between each intersection. – zdebruine Dec 01 '22 at 17:53
1

I'm dealing with a similar problem, where I have to count the number of '1' bits from a run-time-determined range. Don't know how long your bit array is, but mine is millions to billions of bits.

For a very long bit arrays (say, millions of bits), you could build an auxiliary tree structure for each bit array. The root node counts all bits 0~N in the array, the left child node of root stores the partial sum of bits 0~N/2, the right child node store the partial sum of bits (N/2+1)~N, etc. etc. That way, when you want to count the number of '1' bits from an arbitrary range, you traverse the tree and add the partial sum values stored in appropriate nodes of the tree. The time complexity is O(log(N)). When your array is huge, this is going to be much more efficient than iterating over the entire array (complexity O(N)).

If your array is smaller, I would guess SSE/AVX instructions can result in higher throughput? Like Stack Danny has implied, intrinsic functions may already be the underlying implementation of popcount.

If you just use a for loop, GCC is able to vectorize your loop into SSE/AVX (e.g. if you got AVX2 on your CPU, add '-mavx2 -O3' to your compiler command). That works for summing elements of an integer array, but I never examined if looping on a bitarray can also result in vectorized code. https://gcc.gnu.org/projects/tree-ssa/vectorization.html

fjs
  • 330
  • 2
  • 9
1

Use the AND operator for i1 and i2 to get only the intersecting bits. Then in a loop, SHIFT right checking each bit using the AND operator until the variable is zero.

Assuming the arrays are in packs of 4 bytes (unsigned long), this should work:

size_t getIntersecting(unsigned long* i1, unsigned long* i2, size_t nPacks) {
    size_t result = 0;
    unsigned long bitsec;
    for (size_t i = 0; i < nPacks; i++)
    {
        bitsec = i1[i] & i2[i];
        while (bitsec != 0) {
            if (bitsec & 1) {
                result++;
            }
            bitsec >>= 1;
        }
    }
    return result;
}

I'm not at my workstation to test it. Generally it should count intersecting bits.

EDIT:

After reexamination of your question, try something like this.

// Define the global vectors
vector<unsigned long> vector1;
vector<unsigned long> vector2;

// Define the function
void getIntersecting(unsigned long* pi1, unsigned long* pi2, size_t nPacks) {
    // Initialize the counters to 0
    unsigned long i1cnt = 0;
    unsigned long i2cnt = 0;
    // Local storage variables
    unsigned long i1, i2;
    // Loop through the pointers using nPacks as the counter
    for (size_t i = 0; i < nPacks; i++) {
        // Get values from the pointers
        i1 = pi1[i];
        i2 = pi2[i];
        // Perform the bitwise operations
        unsigned long i4 = i1 & i2;
        // Loop for each bit in the 4-byte pack
        for (int ibit = 0; ibit < 32; ibit++) {
            // Check the highest bit of i4
            if (!(i4 & 0x80000000)) {
                // Check the highest bit of i1
                if (i1 & 0x80000000) {
                    i1cnt++;
                }
                // Check the highest bit of i2
                if (i2 & 0x80000000) {
                    i2cnt++;
                }
            }
            else {
                // Push the counters to the global vectors
                vector1.push_back(i1cnt);
                vector2.push_back(i2cnt);
                // Reset counters
                i1cnt = 0;
                i2cnt = 0;
            }
            // Shift i4, i1, and i2 left by 1 bit
            i4 = i4 << 1;
            i1 = i1 << 1;
            i2 = i2 << 1;
        }
    }
}
b.sullender
  • 163
  • 1
  • 16
  • Thanks much! This is a much more complete implementation than other answers. However, it seems inefficient to increment `i1` and `i2` left by 1 bit at a time. I wonder if a popcount of `i1` and `i2` followed by a shift equal to the index of the next intersection would be more efficient (fewer instructions when intersections are generally sparse). Anyhow, this is a great take on the problem. – zdebruine Dec 15 '22 at 16:17
  • 1
    I think adding the optimization techniques you thought of would lead to race issues since we are dealing with 2 separate arrays. Could potentially create 2 separate loops for each array if I'm correct about the race issue, but I can't imagine that would be much faster. If you know for sure that there will be large spaces between intersections, you could test if both integers are zero and skip checking each bit for that pack of bits, effectively skipping alot of potential shift and conditional operations. – b.sullender Dec 15 '22 at 17:45
  • 1
    I tested the code last night. Found it didn't output the expected results so I fixed the function. Works now. 1 possible solution to increase speed i would recommend trying is using SSE2 intrinsic functions for parallel processing of the arrays and pointers. If implemented correctly, could probably see significant speed increase. – b.sullender Dec 16 '22 at 13:16