12

A, B, and C are variables of some unsigned integral type. Conceptually, A is a test vector, B is a bitmask of 'required' bits (at least one corresponding bit in A must be set) and C is a bitmask of 'prohibited' bits (no corresponding bit in A may be set). Since we're mixing bitwise and logical operators, the otherwise natural-seeming solution of

A & B & ~C

is incorrect. Rather the title expression is equivalent to the pseudocode

((a0 & b0) | ... | (an & bn)) & (~(a0 & c0) & ... & ~(an & cn))

where a0, etc. represent individual bits (and n is index of the highest bit). I don't see how to rearrange this effectively and pull out the corresponding code but nonetheless, is there a clever way, maybe with ^, to simplify the expression in the title?

Edit: Prompted by @huseyintugrulbuyukisik's question I note that we can assume (B & C) == 0, but I don't know if that helps.

Edit 2: Results: It depends on how good branch prediction is!

#include <chrono>
#include <cmath>
#include <iostream>
#include <vector>

using UINT = unsigned int;
int main(void)
{
    const auto one = UINT(1);
    const UINT B = (one << 9); // Version 1
//  const UINT B = (one << 31) - 1;  // Version 2
    const UINT C = (one << 5) | (one << 15) | (one << 25);

    const size_t N = 1024 * 1024;
    std::vector<UINT> vecA(N);
    for (size_t i = 0; i < N; ++i)
        vecA[i] = (UINT)rand();

    int ct = 0; //To avoid compiler optimizations
    auto tstart = std::chrono::steady_clock::now();
    for (size_t i = 0; i < N; ++i)
    {
        const UINT A = vecA[i];
        if ((A & B) && !(A & C))
            ++ct;
    }
    auto tend = std::chrono::steady_clock::now();
    auto tdur = std::chrono::duration_cast<std::chrono::milliseconds>(tend - tstart).count();
    std::cout << ct << ", " << tdur << "ms" << std::endl;

    ct = 0;
    tstart = std::chrono::steady_clock::now();
    for (size_t i = 0; i < N; ++i)
    {
        const UINT A = vecA[i];
        if (!((!(A & B)) | (A & C)))
            ++ct;
    }
    tend = std::chrono::steady_clock::now();
    tdur = std::chrono::duration_cast<std::chrono::milliseconds>(tend - tstart).count();
    std::cout << ct << ", " << tdur << "ms" << std::endl;

    return 0;
}

Version 1:

$ ./ops_test 
    65578, 8ms
    65578, 3ms

Version 2:

$ ./ops_test
    130967, 4ms
    130967, 4ms

These are representative values (in reality I ran each test multiple time). g++ 4.8.4, default optimization. I got version-2-like results with only 4 bits set in B. However, my use case is still closer to version 1 so I think @DougCurrie's answer is an improvement.

2501
  • 25,460
  • 4
  • 47
  • 87
Matt Phillips
  • 9,465
  • 8
  • 44
  • 75
  • 5
    It already is fairly simple it seems. – Baum mit Augen Mar 05 '17 at 01:28
  • 1
    do B and C overlap on their set bits? – huseyin tugrul buyukisik Mar 05 '17 at 01:29
  • @BaummitAugen Yes, but this is in a section where every cycle counts, and the appearance of A twice makes me think there might be room for improvement. – Matt Phillips Mar 05 '17 at 01:30
  • @huseyintugrulbuyukisik Hmm, we can assume that `(B & C) == 0`, if that fact can be exploited that would be cool. – Matt Phillips Mar 05 '17 at 01:31
  • What would integral `A && B` mean? Same as `A!=0 && B!=0`? – Scovetta Mar 05 '17 at 01:32
  • @Scovetta Yes.. – Matt Phillips Mar 05 '17 at 01:34
  • 4
    @MattPhilips: bitwise operations themselves are the cheapest thing that you can ask to the CPU, I wouldn't be too worried about them. What may be slightly more costly there is the `&&`, which usually implies a branch. But, as always, measure before optimizing, worrying about this kind of stuff may be completely pointless if your "section where every cycle counts" is actually bound by - say - data not in cache being fetched from RAM. – Matteo Italia Mar 05 '17 at 01:41
  • 1
    Branches are extremely cheap too these days, branch predictors are crazy good. – Baum mit Augen Mar 05 '17 at 01:44
  • @MatteoItalia Point taken. – Matt Phillips Mar 05 '17 at 01:45
  • @BaummitAugen mis-prediction still happens, and it still costs much more than a bitwise operation and a comparison with zero when it does. – Jon Hanna Mar 05 '17 at 01:47
  • @BaummitAugen: Only if the branch is predictable. If it's based on data that will cause it to be unpredictably different at each iteration, all bets are off. – R.. GitHub STOP HELPING ICE Mar 05 '17 at 02:06
  • if it is performance, you could exchange part of bit hack with addition and multiplication so maybe different pipelines work at the same time. – huseyin tugrul buyukisik Mar 05 '17 at 02:08
  • Can B and/or C vary? – ScegfOd Mar 05 '17 at 02:35
  • @JonHanna: The bitwise operation itself may be very cheap, but depending on both operands isn't necessarily, especially in case they both come from memory. The nice thing with a branch is that you can often avoid certain data dependencies instead, which can be worth much more than avoiding a misprediction. – Dolda2000 Mar 05 '17 at 03:32
  • There are silly options like this to avoid having any `!` or `==`: `uint32_t t = a & c; t |= t >> 16; t |= t >> 8; t |= t >> 4; t |= t >> 2; t |= t >> 1; uint32_t const test = a & b & ~t;` As you might expect: not faster. – Ry- Mar 05 '17 at 04:04
  • Why do you want to change this? Have you measured and determined whether this is a bottleneck? Readability is *enormously* valuable whenever possible. "Simplifying" this is likely to make it harder to read. – jpmc26 Mar 05 '17 at 06:13
  • @R.. I'm not saying that it cannot be a problem, only that it usually isn't. – Baum mit Augen Mar 05 '17 at 14:52
  • @MattPhillips C++ and C are two distinct languages. In the future pick only one tag. Thank you. – 2501 Mar 05 '17 at 15:49
  • @2501 Since C++ contains C this issue must come up all the time. Is there any written guideline that says you should only choose one tag? In fact, a search reveals 24,000 SO questions with both tags. – Matt Phillips Mar 05 '17 at 15:54
  • 2
    @MattPhillips *Since C++ contains C this issue must come up all the time.* C++ and C are two distinct languages with separate standards, neither is a subset of the other. *Is there any written guideline that says you should only choose one tag?* The tags should best describe the topic of the question. The contents of this question don't support the inclusion of the C tag. – 2501 Mar 05 '17 at 16:26

2 Answers2

8

!(A & B) must be zero

A & C must be zero

so

(!(A & B)) | (A & C) must be zero

This saves the branch associated with &&; some compilers can optimize ! to be branchless as well.

Doug Currie
  • 40,708
  • 1
  • 95
  • 119
  • "~A & B must be zero" - not true because only 1 bit in B needs to be set in A to pass, not all of them. – samgak Mar 05 '17 at 02:14
  • I fixed this using `!` instead. – Doug Currie Mar 05 '17 at 02:20
  • 1
    So then the test is `(!((!(A & B)) | (A & C)))`, I guess? I'll have to profile this but +1 for this approach, it may well be the best alternative. – Matt Phillips Mar 05 '17 at 02:33
  • 2
    It seems to me that the only real difference here is using bitwise logic instead of short-circuiting logic, though. It could just as well be `!!(A & B) & !(B & C)`, in which case [some might argue](http://yarchive.net/comp/linux/cmov.html) that the latter is better. ;) – Dolda2000 Mar 05 '17 at 02:39
  • I wouldn't call it simpler, but I would call it better! – ScegfOd Mar 05 '17 at 02:45
4

Admittedly, I cannot find a mathematical proof of it, but I'm leading myself to think that your expression cannot be further simplified, at least not simplified into purely bit-wise logic.

The reason being that the two tests (A & B and !(A & C)) are tests of two different kinds: The first tests whether any bits are so-or-so (1, in this case), while the other tests whether all bits are so-or-so (0, in this case).

In all cases, to convert a final bit-array to a single Boolean value, you need some operation that coalesces all bits into one bit (such as ! or the implicit != 0 of the if clause). For the reason outlined above, you need two different such coalescing operators. It is my interpretation of your question that you, by "simplifying" the expression, mean turning it into all-bitwise operations, meaning only using the one coalescing operator implicit in the if clause – which if I'm correct, is not enough.

In the end, I might perhaps add, that even if the expression can be simplified by some standard, I'm not sure it should. The current form of it does after all express the actual intention very well: "These, but not those".

Dolda2000
  • 25,216
  • 4
  • 51
  • 92
  • 2
    That's misleading, since computers can *also* do arithmetic. Suppose, for a simple example, that we knew that the sign-bit is not in either of the bit masks. Then `(A&B)-1` is negative precisely if `A&B` is 0. Also, `-(A&C)` is negative precisely if `A&C` is not 0. IOW, the two expressions must both be positive for the original test to be true, so it is sufficient to test `(((A&B)-1)|-(A&C)) >= 0` (or replace the >=0 with a bitwise check on the signbit). For portable C you need to play more games with signed/unsigned casts which won't fit in this comment, but it's certainly doable. – rici Mar 05 '17 at 03:12
  • @rici: True enough, that does simplify the test to *all-ALU operations*, if that's the goal. – Dolda2000 Mar 05 '17 at 03:14
  • Usually the goals of bit-hackery are (1) find the absolutely fastest solution and (2) obfuscate the solution to the point of non-intelligibility. I think the above at least satisfies (2). Benchmarking would be necessary to demonstrate (1) (or not). – rici Mar 05 '17 at 03:19
  • @rici: Another reasonable interpretation of "simplifying" could be to remove the "redundant" mention of `A` in the expression, though, which I'm still mulling over. Of course, I happily award the extra points for optimizing the illegibility of the expression. – Dolda2000 Mar 05 '17 at 03:21