Why can't GCC optimize the logical / bitwise AND pair in "x && (x & 4242)" to "x & 4242"?

Question

Here are two functions which I claim do exactly the same thing:

bool fast(int x)
{
  return x & 4242;
}

bool slow(int x)
{
  return x && (x & 4242);
}

Logically they do the same thing, and just to be 100% sure I wrote a test that ran all four billion possible inputs through both of them, and they matched. (x & 4242 is only non-zero if it has set bits in specific positions, which means x has a non-zero value, so testing x!=0 separately as the other side of a logical && is redundant.) But the assembly code is a different story:

fast:
    andl    $4242, %edi
    setne   %al
    ret

slow:
    xorl    %eax, %eax
    testl   %edi, %edi
    je      .L3
    andl    $4242, %edi
    setne   %al
.L3:
    rep
    ret

I was surprised that GCC could not make the leap of logic to eliminate the redundant test. I tried g++ 4.4.3 and 4.7.2 with -O2, -O3, and -Os, all of which generated the same code. The platform is Linux x86_64.

Can someone explain why GCC shouldn't be smart enough to generate the same code in both cases?

Edit to add test harness:

#include <cstdlib>
#include <vector>
using namespace std;

int main(int argc, char* argv[])
{
    // make vector filled with numbers starting from argv[1]
    int seed = atoi(argv[1]);
    vector<int> v(100000);
    for (int j = 0; j < 100000; ++j)
        v[j] = j + seed;

    // count how many times the function returns true
    int result = 0;
    for (int j = 0; j < 100000; ++j)
        for (int i : v)
            result += slow(i); // or fast(i), try both

    return result;
}

I tested the above with clang 5.1 on Mac OS with -O3. It took 2.9 seconds using fast() and 3.8 seconds using slow(). If I instead use a vector of all zeros, there is no significant difference in performance between the two functions.

Other compilers:

mainline clang 3.7 and later do the optimization even for &&, clang 3.6 and earlier don't. https://godbolt.org/z/v5bjrvrP1
latest GCC trunk (march 2022) and 11.2 still don't.
Current MSVC does both parts with branches, not using setcc.
ICC makes asm like GCC's, LLVM-based ICX is like clang. https://godbolt.org/z/cjKfr8r5b

how are those two functions doing same thing? The first one returns an `int` (`x & 4242`) while the second one returns either `1` or `0`. — 0xF1, Apr 14 '14 at 08:53
No, those functions definitely don't do the same thing. All you proven with your test is they implement the same mapping. — Michael Foukarakis, Apr 14 '14 at 08:56
@MadHatter: How can `bool fast(int x)` return any `int` at all? Both versions return `true` if and only if `x` contains at least one of the bits in `4242`. — MSalters, Apr 14 '14 at 08:58
@MadHatter Both return `true` or `false`. The compiler only has to follow the as-if rule to ensure that the right result is returned for a given input. Are you saying you can get different results with the same input? — juanchopanza, Apr 14 '14 at 08:58
I suspect there's actually quite a lot of special-case reasoning to formalize for making that leap, and it's such an odd special case that optimizing it in the compiler won't be worth the effort. — molbdnilo, Apr 14 '14 at 08:59
Why should the GCC maintainers bother to implement an optimization for code that's very much sub-optimal in the first place? It's not the optimizer's job to relieve you of thinking for yourself... — DevSolar, Apr 14 '14 at 09:00
@DevSolar: you could say the same thing of dead code elimination, but compilers still do it. There are various means by which people write or auto-generate sub-optimal code, and it's *useful* when the compiler improves it. — Steve Jessop, Apr 14 '14 at 09:01
@DevSolar If someone from GCC said that, then it would constitute a valid answer. — juanchopanza, Apr 14 '14 at 09:01
@SteveJessop: ...but I don't complain if they don't either, because I could have optimized that just as well myself. I'd worry about other, more real-life optimizations where I *don't* have the option of optimizing it myself (without going to great lengths). — DevSolar, Apr 14 '14 at 09:02
@juanchopanza: And you know I'm not "from GCC"? I actually am not, but you are aware that judging the validity of an answer by the merits of the person giving the answer is a falacy? Either my comment is a valid one, or it is not. It should not matter who I am. — DevSolar, Apr 14 '14 at 09:05
At least in C++, such "sub-optimal" code may be the result of a particular template instantiation. C++ relies much more on compilers optimizing. — MSalters, Apr 14 '14 at 09:06
@all : I am sorry, I got confused, in C, `bool` (a.k.a `_Bool`) variable can be assigned any integer value, but when it is read back it returns (or read as) only either `1` or `0`. — 0xF1, Apr 14 '14 at 09:06
@DevSolar: it's not a fallacy in this case. The question is about the motivations of the authors of GCC and the decisions they made. If you are an author of GCC responsible for this aspect of optimizations, then your statements about the role of the optimizer are more relvant than those of an author of MSVC saying the same thing. Similarly if you could could cite GCC authors agreeing with your opinion on compilers, that would be more of an answer than just stating your opinion on compilers would be. Ofc you aren't claiming it's an answer, it's a comment :-) — Steve Jessop, Apr 14 '14 at 09:07
@DevSolar Ah, the "all points of view have the same weight" fallacy, I like that one :-) — juanchopanza, Apr 14 '14 at 09:11
John, I do not see any performance testing. Can you run it a new million times with `x=0` and `x=4242` and `x=1`? — Yakk - Adam Nevraumont, Apr 14 '14 at 09:34
FWIW: MSVC 2013 also doesn't perform the optimization (at least with `/Ox` optimization). — Michael Burr, Apr 14 '14 at 09:39
Clang and ICC didn't perform this optimization either in my test. — harold, Apr 14 '14 at 10:15
How do we know this is actually an optimization? The later is more instructions but that is not saying it is slower? I would be interested to see A: Which is faster with random inputs B: Which is faster when nearly all the inputs are 0 as I am not convinced that the later will be slower in all cases. — Vality, Apr 14 '14 at 10:18
@Vality: I have edited my question to add a test harness and the results I see. Short answer is if inputs are all zero it's the same either way, otherwise fast() is faster. — John Zwinck, Apr 14 '14 at 11:14
@Yakk: I added a performance test. The results are fairly unsurprising to me, but thank you for demanding empirical results. :) — John Zwinck, Apr 14 '14 at 11:16
@JohnZwinck Thankyou, that definitely deserves a +1. Really interesting to see the results. — Vality, Apr 14 '14 at 13:03
`int fun(int x){return x?x&4242:0;}` will most likely be optimized by gcc-4.10 (patch under review). However, the conversion to bool makes things much harder. — Marc Glisse, Apr 14 '14 at 17:36
Does gcc figure out what's faster if you do profiling-guided optimization now? — , Apr 14 '14 at 17:40
Clang 3.6 and earlier do not perform this optimization (unless you change the logical `&&` to a bitwise `&`), which would have been the current version back when this question was asked. However, **Clang 3.7 and later *do* perform this optimization**, no matter how you write the code (including for all the variants that Nemo suggests in his answer). Meanwhile, there has been no progress in GCC (**GCC 6.2 still behaves exactly as Nemo documents**), and **no version of MSVC (including the latest VS 2015) optimizes this, either**, producing even worse code than GCC. — Cody Gray - on strike, Aug 31 '16 at 12:23

score 51 · Answer 1 · edited Apr 14 '14 at 15:41

51

Exactly why should it be able to optimize the code? You're assuming that any transformation that works will be done. That's not at all how optimizers work. They're not Artificial Intelligences. They simply work by parametrically replacing known patterns. E.g. the "Common Subexpression Elimination" scans an expression for common subexpressions, and moves them forwards, if that does not change side effects.

(BTW, CSE shows that optimizers are already quite aware of what code movement is allowed in the possible presence of side effects. They know that you have to be careful with &&. Whether expr && expr can be CSE-optimized or not depends on the side effects of expr.)

So, in summary: which pattern do you think applies here?

edited Apr 14 '14 at 15:41

Bridge

29,818
9
60
82

answered Apr 14 '14 at 09:03

MSalters

173,980
10
155
350

9

We know that GCC has many ways of establishing equivalent arithmetic expressions and relations between expressions, which it uses at the point of emitting code if not before. One might naively assume the pattern: "given side-effect-free `A && B`, if `(bool)B` is false whenever `(bool)A` is false, transform to `B`". But of course that has performance implications when `A` is faster to evaluate than `B`. Those implications might even be the answer to the question, I just don't know. – Steve Jessop Apr 14 '14 at 09:14
7

@SteveJessop: The particular form `A&&B` where `B` implies `A` is not exactly rare; it's a common (human) optimization to first calculate a fast `A` expression before calculating the expensive `B`. E.g. check `!string::empty()` before creating a `regex` even if that regex would do the right thing on an empty input. So as an optimizer writer I'd leave those `A && B` alone. That might very well be the answer indeed. – MSalters Apr 14 '14 at 09:47
4

Yep. It may not be high priority but I think there's still a question whether, for arithmetic expressions, the compiler should make its own assessment of the performance of `A` and `B`, ignoring what some dumb-ass sack of giblets thinks on the subject. Which is kind of what I want from a compiler ;-) As you pointed out, templates produce code where the case for a specific type is "obviously" written wrongly, but I don't want to have to specialize for performance. – Steve Jessop Apr 14 '14 at 10:03
Are you familiar with Karnaugh Maps? It's sort of like that...or maybe a more widely-understood system is like a proof in mathematics with a dichotomy. Anyway, we have two expressions with a logical AND: `x` and `x & 4242`. Call them A and B. If A is true, the result is B. If A is false, the result is false. If A is false, B must also be false, because it is a more specific test ("these bits" vs. "any bits" set). Since A being false means B is also false, and when A is false we must return false, we can return B when A is false, and also when it is true, hence A is "don't care." – John Zwinck Apr 14 '14 at 11:01
1

@JohnZwinck: That's why I wrote ["implies"](http://stackoverflow.com/questions/1823168/boolean-implication), in particular "B implies A". – MSalters Apr 14 '14 at 12:20
Indeed. But I agree with @SteveJessop that the human may have got it wrong (as they did in this case, for reasons lost to time). It strikes me the same as Steve that the compiler could look at what's being produced and make a smarter decision. For example, GCC already decides whether or not to inline something partly based on whether it would increase the code size or not. In this case the Sufficiently Smart Compiler could compare the naive code vs. the simplified version and easily understand that the simplified one is better (roughly half the registers, instructions, and branches). – John Zwinck Apr 14 '14 at 12:27
@JohnZwinck: You can create two truth tables for all 4 billion inputs, but that's brute force and not realistic. But there are no algorithms to establish whether any two random expressions have identical results. Current optimizers just have a bunch of special cases. – MSalters Apr 14 '14 at 12:32
1

Such tools do exist - they're called superoptimisers. In practice they're pretty much unusable for large chunks of code, where "large" means more than about ten instructions. (Maybe with modern GPUs it's worth investigating them again? IDK.) – Alex Celeste Apr 14 '14 at 12:44
1

@Leushenko: I read the GGC paper, it mentions 4 instructions. Since the search is exhaustive, the time needed is exponential. A GPU won't work (different ISA) but a Xeon Phi might work. Still, that buys you just O(1) instructions - exponential growth is annoyingly fast. – MSalters Apr 14 '14 at 12:51
2

Or don't make a truth table. An SMT solver could trivially solve this problem. Not all problems, obviously, but it could solve this one. – harold Apr 14 '14 at 12:56

score 33 · Accepted Answer · edited Jul 29 '14 at 11:31

33

You are correct that this appears to be a deficiency, and possibly an outright bug, in the optimizer.

Consider:

bool slow(int x)
{
  return x && (x & 4242);
}

bool slow2(int x)
{
  return (x & 4242) && x;
}

Assembly emitted by GCC 4.8.1 (-O3):

slow:
    xorl    %eax, %eax
    testl   %edi, %edi
    je      .L2
    andl    $4242, %edi
    setne   %al
.L2:
    rep ret

slow2:
    andl    $4242, %edi
    setne   %al
    ret

In other words, slow2 is misnamed.

I have only contributed the occasional patch to GCC, so whether my point of view carries any weight is debatable :-). But it is certainly strange, in my view, for GCC to optimize one of these and not the other. I suggest filing a bug report.

[Update]

Surprisingly small changes appear to make a big difference. For example:

bool slow3(int x)
{
  int y = x & 4242;
  return y && x;
}

...generates "slow" code again. I have no hypothesis for this behavior.

You can experiment with all of these on multiple compilers here.

edited Jul 29 '14 at 11:31

Michael Kohne

11,888
3
47
79

answered Apr 14 '14 at 14:13

Nemo

70,042
10
116
153

5

Logical AND is short-circuited, right? That may explain why putting it on the left-hand side does that. – 2rs2ts Apr 14 '14 at 14:33
1

Not entirely strange, but it helps understand why things fail. `(bool)(x & 4242)` implies `(bool)x` but not vice versa. – MSalters Apr 14 '14 at 14:33
@2rs2ts: There's a deleted answer which stated the same. Point is, the optimizer knows that there's no point in short-circuiting because there are no observable side effects on either side. – MSalters Apr 14 '14 at 14:36
@MSalters I was going to say that the optimizer ought to know that there are no side-effects, but I didn't know if it'd infer anything from that. Thanks for pointing that ought, very cool. – 2rs2ts Apr 14 '14 at 14:39
2

@2rs2ts: The optimizer absolutely has to know, for instance to make CSE possible. That's not allowed if that CSE has side effects (which should happen each time). – MSalters Apr 14 '14 at 14:44
@MSalters: But `(bool)x` false implies `(bool)(x & 4242)` also false. This is no more logically complex than your statement, in my view. My guess is that GCC is reluctant to perform this sort of optimization when the right-hand argument to `&&` is "complex". Short-circuiting can be relevant to performance even when the expressions are side-effect free. I would still file the bug report. – Nemo Apr 14 '14 at 15:10
@Nemo: Well-known identity. (A ⇒ B) ⇔ (¬B ⇒ ¬A) – MSalters Apr 14 '14 at 15:35
@MSalters: Yes, also known as _modus tollens_ in propositional logic. My point is that I doubt this has anything to do with whether GCC optimizes this code; GCC is generally capable of far more sophisticated inferences than this. I do not actually know what is going on here (see e.g. my last update)... It is certainly an interesting question worth asking the maintainers. – Nemo Apr 14 '14 at 16:11
2

BTW, clang optimizes all of these, but even current GCC 8 years later doesn't: https://gcc.godbolt.org/z/7nbxfaE1x . Agreed that `slow3` is surprising. – Peter Cordes Mar 16 '22 at 15:16

score 13 · Answer 3 · edited Aug 26 '14 at 12:49

13

This is how your code looks in ARM which should make slow run faster when input it 0.

fast(int):
    movw    r3, #4242
    and r3, r0, r3
    adds    r0, r3, #0
    movne   r0, #1
    bx  lr
slow(int):
    cmp r0, #0
    bxeq    lr
    movw    r3, #4242
    and r3, r0, r3
    adds    r0, r3, #0
    movne   r0, #1
    bx  lr

However GCC would optimize very nicely when you start using such trivial functions anyway.

bool foo() {
    return fast(4242) && slow(42);
}

becomes

foo():
    mov r0, #1
    bx  lr

My point is sometimes such code requires more context to be optimized further so why would implementers of optimizers (improvers!) should bother?

Another example:

bool bar(int c) {
  if (fast(c))
    return slow(c);
}

becomes

bar(int):
    movw    r3, #4242
    and r3, r0, r3
    cmp r3, #0
    movne   r0, #1
    bxne    lr
    bx  lr

edited Aug 26 '14 at 12:49

Michael Kohne

11,888
3
47
79

answered Apr 14 '14 at 09:37

auselen

27,577
7
73
114

10

Well, duh - if you pass in constants, GCC can calculate the result directly. It _has_ to have this capability, for `constexpr`. – MSalters Apr 14 '14 at 09:39
@MSalters that was actually my point, in that case constants provides a context. added one more example, dead code elimination? – auselen Apr 14 '14 at 09:51
3

The problem was that the two snippets are identical for 4 billion possible inputs, not just one. It's reasonable for the compiler to test the one set of arguments you explicitly provided, but not to test all 4 billion possible arguments. – MSalters Apr 14 '14 at 12:23
@James_pic: One problem with the notion of leaving optimization up to compilers is that compilers have no way of knowing whether `x==0` is going to be true 99% of the time, 0.00001% of the time, or somewhere in between. If it happens to be true 90% of the time, an optimization that saves one cycle on that 90% case and wastes four on the 10% case would save half a cycle on the average case. – supercat Apr 14 '14 at 18:06
Indeed, I'd say it's a reasonable assumption that if you write `x &&` in front, you have a reason to do so, and the reason is most likely that `x==0` is the most common case. – celtschk Apr 14 '14 at 22:59

score 8 · Answer 4 · 2014-04-14T14:05:22.033

8

To perform this optimization, one needs to study the expression for two distinct cases: x == 0, simplifying to false, and x != 0, simplifying to x & 4242. And then be smart enough to see that the value of the second expression also yields the correct value even for x == 0.

Let us imagine that the compiler performs a case study and finds simplifications.

If x != 0, the expression simplifies to x & 4242.

If x == 0, the expression simplifies to false.

After simplification, we obtain two completely unrelated expressions. To reconcile them, the compiler should ask unnatural questions:

If x != 0, can false be used instead of x & 4242 anyway ? [No]

If x == 0, can x & 4242 be used instead of false anyway ? [Yes]

edited Apr 14 '14 at 14:05

answered Apr 14 '14 at 11:27

The "range" 0 is often checked for specifically, because of its atypical behavior in many operations. Quite a lot of binary operations can be simplified if either of the arguments is zero, both arithmetic and logical/boolean. – MSalters Apr 14 '14 at 12:27
@MSalters: yes, simplifying an expression in special/frequent cases is doable. It is not just that. It is simplifying the expression and checking that it matches another expression when the specific value is used. Otherwise, the transformed code could be an inefficient `x ? x & 4242 : false;` – Apr 14 '14 at 13:07
In this case, checking the zero case specifically is sufficient. You can then optimize the "zero" case to `false && (0 & 4242)` and the "non-zero" case to `true && x (x & 4242)`. As `&&` is one of those boolean binary operators for which `0` is a special case, you indeed get the desired optimized forms `false` and `x & 4242`, which can then be merged back to just `x & 4242`. That's three steps, none of which is exceptional in an optimizer. – MSalters Apr 14 '14 at 13:19
1

@MSalters: I don't agree with that. It is easy to see that for `x == 0` the expression simplifies to `false`, and for `x != 0` it simplifies to `x & 4242`. Hence the rewrite `x ? x & 4242 : false`. Now the unnatural step is to try and get rid of the `?` operator by looking for properties of the subexpressions outside of the domains for which they were established, and discover that by chance `x & 4242` fits everywhere [in fact, establishing that `x ? x & 4242 : false` is equivalent to `x ? x & 4242 : x & 4242`]. – Apr 14 '14 at 13:35
1

I'm not proposing a particularly difficult rewrite. Substitute left in right and right in left, that's all. Obviously `0 & 4242` is a valid substitute for `false`. Finding a third expression that's the union of two unrelated expressions would be hard, though. – MSalters Apr 14 '14 at 13:48
Indeed, merging `false` with `x & 4242` is anything but obvious. – Apr 14 '14 at 13:51
2

I don't think it's much of a leap for the optimizer to investigate `x == 0` as a special case *when `x` is the operand of `&&`*. It's not an unrealistic brute force to look at both legs of a binary choice! The only question for the optimizer to ask is, "does `(bool)(x & 4242)` imply `(bool)x`?". It's easy to see that it does (at any rate, no harder to see than plenty of pinhole optimizations that GCC does make with arithmetic expressions), so the optimizer could see that the branch is logically redundant if it thought the issue worth investigation. – Steve Jessop Apr 14 '14 at 14:01
I disagree with the "unnatural questions" idea of resolving `x ? false : (x & 4242)`. It **already** has to try converting the second argument to the type of the third and vice versa to determine the result **type**. Once you're already doing that, it's a relatively small step to look beyond the type and at the value. – MSalters Apr 14 '14 at 14:42
It is unnatural because `x & 4242` is established under the assumption that `x != 0`. And you need to look at it under the opposite assumption that `x == 0`, a priori irrelevant. – Apr 14 '14 at 15:21
@YvesDaoust: The whole point of looking under the opposite assumption is exactly because you want to fold the two expressions together. Look, optimization at this level is **full** of unnatural tricks. That is why we usually leave it to the compiler. This particular folding is nowhere special. – MSalters Apr 15 '14 at 00:07
@SteveJessop: Not sure if you saw my answer to this question, but simply reversing the order of operands to `&&` allows GCC to optimize the code properly. So not only is this optimization perfectly obvious, but GCC can already perform it. Why it does not in this case is indeed a mystery and a reasonable question/bug for the GCC maintainers. – Nemo Apr 17 '14 at 03:41
@Nemo: I did see that, but I'm not sure I consider it "the same optimization" with the operands reversed. In the questioner's code there's still at least the possibility the "slow" code is faster when `x` is almost always `0` (see my comments to MSalter's answer). If the optimizer leaves it alone for that reason then I don't think it's clearly a bug even if it's not my preference. If the optimizer leaves it alone for some stupid reason then it would be a bug :-) I agree it's a reasonable question / feature request. – Steve Jessop Apr 17 '14 at 07:08
... for a more obvious example suppose you wrote `x && sin(x)`. That could be simplified to `(bool)sin(x)`, but I'm not sure you'd want it to be because the `x &&` might be an intentional attempted optimization by the programmer believing that `sin` is quite slow. However `sin(x) && x` *cannot* be an attempted optimization, it wouldn't make sense to expect the short-circuit to improve any case. – Steve Jessop Apr 17 '14 at 07:15
@SteveJessop: Well, if `x` is an integer, then sin(x)=0 implies x=0, so your expression can arguably be simplified to `(bool)x` :-). But your point is well taken, and I made the same point myself in a comment on my own answer. However... (a) Most answers, including this one, argue that this optimization is too complicated *logically* for GCC, which is clearly untrue; and (b) this argument does not explain the difference in output between `slow2` and `slow3` in my (updated) answer. This last looks indisputably like a bug. – Nemo Apr 17 '14 at 16:56

score 7 · Answer 5 · answered Apr 16 '14 at 16:11

The last compiler I worked on did not do these sorts of optimizations. Writing an optimizer to take advantage of optimizations related to combining binary and logical operators will not speed up the applications. The main reason for this is that people do not use binary operators like that very often. Many people don't feel comfortable with binary operators and those that do will typically not write useless operations that need to be optimized.

If I go to the trouble of writing

return (x & 4242)

and I understand what that means why would I bother with the extra step. For the same reason i would not write this suboptimal code

if (x==0) return false;
if (x==1) return true;
if (x==0xFFFEFD6) return false;
if (x==4242) return true;
return (x & 4242)

There is just better use of compiler dev's time than to optimize stuff that makes no difference. There are just so many bigger fish to fry in compiler optimization.

What do you think of the trend of focusing on optimizations that will break code that would have worked on just about any microcomputer compiler in the 1990s (e.g. `unsigned mul(unsigned short x, unsigned short y) { return x*y; }`, or just about anything having to do with aliasing) while neglecting to provide safe forms of optimization? — supercat, May 19 '16 at 16:42

andypea · Answer 6 · 2014-04-22T22:53:49.917

6

It is mildly interesting to note that this optimisation is not valid on all machines. Specifically if you run on a machine which uses the one's complement representation of negative numbers then:

-0 & 4242 == true
-0 && ( -0 & 4242 ) == false

GCC has never supported such representations, but they are allowed for by the C standard.

edited Apr 22 '14 at 22:53

answered Apr 16 '14 at 00:37

andypea

1,343
11
22

7

An interesting observation, but not an "important" one. This question is about the behavior of a particular compiler, so it is already platform-dependent. And every platform ever supported by GCC -- indeed, every platform whatsoever for the past 40+ years -- has used two's complement. – Nemo Apr 17 '14 at 03:36
1

You're right. However, it does highlight how seemingly trivial optimisations can have unexpected exceptions. Consideration of all these edge cases makes implementation of simple optimisations very time-consuming. – andypea Apr 22 '14 at 23:05

score 3 · Answer 7 · edited Apr 15 '14 at 15:20

C places fewer restrictions on the behavior of signed integral types then unsigned integral types. Negative values in particular can legally do strange things with bit operations. If any possible arguments to the bit operation have legally unconstrained behavior, the compiler can't remove them.

For example, "x/y==1 or true" might crash the program if you divide by zero, so the compiler can't ignore the evaluation of the division. Negative signed values and bit operations never actually do things like that on any common system, but I'm not sure the language definition rules it out.

You should try the code with unsigned ints and see if that helps. If it does you'll know it's an issue with the types and not the expression.

You got it exactly backwards. If the input values would lead to unspecified or undefined behavior, the compiler has full freedom of implementation. For instance, in `x/y==1 or true`, the compiler may assume three lines earlier (!!) that `y != 0`. That legal because the compiler may assume there is no Undefined Behavior whatsoever. As a result, UB can appear to travel backwards in time. — MSalters, Apr 15 '14 at 00:11

dagelf · Answer 8 · 2022-11-15T07:38:24.067

-2

Not an answer, but a note on the topic - which could well be phrased "Should" the compiler optimize it:

Logical means bool which is either 0 meaning false or non-zero meaning true and the operator that yields these is && with keyword and.

Bitwise means boolean logic and the operator is & with keyword bitand.

&& is essentially wraps each term with (x!=0)?1:0 ie. "is it non-0?" or "if it's !=0 then it's 1"

& checks bit sameness. ie. "Give me the bits that are the same". Which works as expected for bool values, but any other you just get the bits that are the same in all values.

You can play with equivalents here. (The confusion arises because values != 0 also evaluate to true - another question arises: shouldn't they just be "undefined" and generate a warning, to avoid people mistaking these?)

So if you're dealing with just bool values, you can just bitwise AND for both evaluations.

bool fast(bool x)
{
  return x & 4242;
}

bool slow(bool x)
{
  return x & (x & 4242);
}

That gets optimized just fine. See here.

If each & produces a 0 or 1 or is a bool, then it's a drop in replacement. But (y && (x & z)) and ( y & (x & z)) are not equivalent if any value is greater than 1. For example: 1 && (2&2) is true, 1 & (2&2) is false. It is again equivalent at 1 && (3 & 3 ) but it should be clear that these don't compare the same things. The former tests if y is true, and if x and z are non-zero while the latter tests which bits are the same on x, y and z. (See here)

See also: Is there any difference between && and & with bool(s)? and Boolean values as 8 bit in compilers. Are operations on them inefficient?

edited Nov 15 '22 at 07:38

answered Mar 16 '22 at 11:46

dagelf

1,468
1
14
25

2

The circumstance where you couldn't use `x & (x & 4242)` would be if the first condition was something other than `x`. e.g. `y & (x & 4242);` is *not* equivalent to `y && (x & 4242);`. (e.g. consider `y=1`, `x=2`. `1 && 2` is true, `1 & 2` is false). Presumably you'd never write `x && (x & 4242);` on one line in the first place, it might just happen after inlining when you pass the same arg twice to a function. – Peter Cordes Mar 16 '22 at 15:26
So your `slow` with only bitwise AND seems pointless / unlikely, or buggy if you intended the same semantic meaning for human readers as `&&`. (And like I said, less likely that it could be the result of inlining something with the same arg twice for code that supports `y && (x&mask)`.) – Peter Cordes Mar 16 '22 at 15:27
Goes without saying. But in the case of y && there's nothing to optimize out, so the question doesn't make sense :-) – dagelf Mar 16 '22 at 15:35
1

Right, of course there's nothing to optimize out, unless it's in `bool bar(int x, int y) { return y && (x & 4242); }` inlining into `bar(a,a);`. I guess the point I was trying to make is: if you're going to think through what your expression is equivalent to and manually optimize it, you'd *never* literally write `x & (x & 4242)`, you'd just write `x & 4242`. If you aren't going to think things through carefully, then **`&` is not a drop-in replacement for `&&` in the general case**, so it doesn't make sense to recommend this as an optimization or assume it should have been done by hand. – Peter Cordes Mar 16 '22 at 15:39
Actually, it depends on the return value. For a `bool` return value it is, and it makes more sense. y && ( x & z ) would have an optimization identical to y & ( x & z) because you're essentially just checking for 0, and the evaluation can stop at the first 0. On the other hand, if the return value is `int`, then you're right. – dagelf Mar 16 '22 at 16:06
`1 && (2&2)` is `true`, `1 & (2&2)` is `false`. If you're saying that change is true in general for a `bool` result, you're mistaken; checking for a non-zero intersection is not the same as checking that both have *some* non-zero bits somewhere. – Peter Cordes Mar 16 '22 at 16:09
Fair point.. yet, if all the inputs are `bool` then those don't occur. (Wait, is that what `&&` does? Casts the parameters into `bool`?) I guess the difference should be made clear between TRUE/FALSE `&&` and bitwise ANDs `&` having the same bits set. – dagelf Mar 17 '22 at 02:58
1

Yeah, that's the clear difference, and why this answer IMO doesn't make much sense and doesn't seem very relevant to the question. You'd write this for very different reasons than you'd write `&&`. If you're talking about the title like I think you are in the first part of your answer, I think it's intended as "why can't GCC optimize the pair of logical-AND / bitwise-AND operators". Or at least, that's the only reading that's compatible with the grammar and isn't nonsense, at least if we're generous about omitted punctuation like "logical / bitwise AND" or "logical and bitwise AND". – Peter Cordes Mar 17 '22 at 03:10
1

Re: conversion to `bool`: good question, I checked. Yes, that's literally what happens according to the standard: https://eel.is/c++draft/expr.log.and . And the final result is a `bool`. – Peter Cordes Mar 17 '22 at 03:12
Since I just took the time to think of title phrasing that would avoid the problem you pointed out, I went ahead and edited the title to fix that. You can remove that part of your answer, I think. (Or perhaps delete the whole thing, or at least rewrite as an observation about a related thing, without suggesting this is something it makes sense to have written when `&&` semantics is what you wanted.) – Peter Cordes Mar 17 '22 at 03:17
Re: using bitwise vs. logical AND operators on things that are already `bool`: see [Boolean values as 8 bit in compilers. Are operations on them inefficient?](https://stackoverflow.com/q/47243955) - there are some missed-optimizations in some cases, and sometimes it does indeed help to write `&` instead of `&&`. – Peter Cordes Mar 17 '22 at 03:20
1

@PeterCordes: See also https://stackoverflow.com/a/6577545 – Nemo Mar 17 '22 at 03:45
Aah! That's better. Back to my answer - perhaps a better original question would've been "should" the compiler optimize it. Initially it seemed to be based on an assumption that bitwise and logical are the same, (which in certain textbooks they are) and thinking that they are the same makes is seemingly obvious that it should, and for anyone new to the discussion it's important to note the difference, which nobody else explained. – dagelf Mar 17 '22 at 03:47
1

Yes, there's an important difference, but now that the title is fixed I don't think it really needs to be explained. The questions doesn't look to me like it's based on a mixup or misunderstanding of bitwise and logical, just a question about a missed optimization with logical operators. Maybe worth linking (e.g. in comments) one of the existing Q&As which explains the different in case other future readers are confused? But anyway, your answer now wrongly claims `x>0 ? 1 : 0` instead of `x != 0 ? 1 : 0`. C++ has signed types, and any non-zero, including `-1` is true. – Peter Cordes Mar 17 '22 at 04:11
Well I sure needed explaining to. Fixed. – dagelf Mar 17 '22 at 04:41
1

I edited to fix a couple new/remaining mistakes. But your code-block examples don't match your Godbolt link: `x & 4242` is pointless for `bool`; they return a constant 0. https://godbolt.org/z/Kzx7obf3e - because `(int)bool` is either 0 or 1, and bitwise AND with a value that doesn't have its low bit set is always zero. – Peter Cordes Mar 17 '22 at 15:28

Why can't GCC optimize the logical / bitwise AND pair in "x && (x & 4242)" to "x & 4242"?

8 Answers8