A, B, and C are variables of some unsigned integral type. Conceptually, A is a test vector, B is a bitmask of 'required' bits (at least one corresponding bit in A must be set) and C is a bitmask of 'prohibited' bits (no corresponding bit in A may be set). Since we're mixing bitwise and logical operators, the otherwise natural-seeming solution of
A & B & ~C
is incorrect. Rather the title expression is equivalent to the pseudocode
((a0 & b0) | ... | (an & bn)) & (~(a0 & c0) & ... & ~(an & cn))
where a0
, etc. represent individual bits (and n
is index of the highest bit). I don't see how to rearrange this effectively and pull out the corresponding code but nonetheless, is there a clever way, maybe with ^
, to simplify the expression in the title?
Edit: Prompted by @huseyintugrulbuyukisik's question I note that we can assume (B & C) == 0
, but I don't know if that helps.
Edit 2: Results: It depends on how good branch prediction is!
#include <chrono>
#include <cmath>
#include <iostream>
#include <vector>
using UINT = unsigned int;
int main(void)
{
const auto one = UINT(1);
const UINT B = (one << 9); // Version 1
// const UINT B = (one << 31) - 1; // Version 2
const UINT C = (one << 5) | (one << 15) | (one << 25);
const size_t N = 1024 * 1024;
std::vector<UINT> vecA(N);
for (size_t i = 0; i < N; ++i)
vecA[i] = (UINT)rand();
int ct = 0; //To avoid compiler optimizations
auto tstart = std::chrono::steady_clock::now();
for (size_t i = 0; i < N; ++i)
{
const UINT A = vecA[i];
if ((A & B) && !(A & C))
++ct;
}
auto tend = std::chrono::steady_clock::now();
auto tdur = std::chrono::duration_cast<std::chrono::milliseconds>(tend - tstart).count();
std::cout << ct << ", " << tdur << "ms" << std::endl;
ct = 0;
tstart = std::chrono::steady_clock::now();
for (size_t i = 0; i < N; ++i)
{
const UINT A = vecA[i];
if (!((!(A & B)) | (A & C)))
++ct;
}
tend = std::chrono::steady_clock::now();
tdur = std::chrono::duration_cast<std::chrono::milliseconds>(tend - tstart).count();
std::cout << ct << ", " << tdur << "ms" << std::endl;
return 0;
}
Version 1:
$ ./ops_test
65578, 8ms
65578, 3ms
Version 2:
$ ./ops_test
130967, 4ms
130967, 4ms
These are representative values (in reality I ran each test multiple time). g++ 4.8.4, default optimization. I got version-2-like results with only 4 bits set in B
. However, my use case is still closer to version 1 so I think @DougCurrie's answer is an improvement.