-5

Question Context: [C++] I want to know what is theoretically the fastest, and what the compiler will do. I don't want to hear about premature optimization is the root of all evil, etc.

I was writing some code like this:

bool b0 = ...;
bool b1 = ...;

if (b0 && b1)
{
    ...
}

But then I was thinking: the code, as-is, will compile into two TEST instructions, if compiled without optimizations. This means two branches. So I was thinking that it might be better to write:

if (b0 & b1)

Which will produce only one TEST instruction, if no optimization is done by the compiler. But then I feel that this is against my code-style. I usually write && and ||.

Q: What will the compiler do if I turn on optimization flags (-O1, -O2, -O3, -Os and -Ofast). Will the compiler automatically compile it like &, even if I have used a && in the code? And what is theoretically faster? Does the behavior change if I do this:

if (b0 && b1)
{ ... }
else if (b0) 
{ ... }
else if (b1)
{ ... }
else
{ ... }

Q: As I could have guessed, this is very depended on the situation, but is it a common trick for a compiler to replace a && with a &?

Martijn Courteaux
  • 67,591
  • 47
  • 198
  • 287
  • 5
    This is incredibly dependent on the amount of code within the blocks, the architecture you're running on, and the compiler you're compiling with. For example, some architectures have conditional instructions which might be used if you have a small amount of code within the block. – Bill Lynch Sep 11 '14 at 15:16
  • @sharth: I could have guessed that, but is it a common trick for compilers to replace a `&&` with a `&`? – Martijn Courteaux Sep 11 '14 at 15:23
  • 3
    @MartijnCourteaux, if second boolean is heavy to calculate, that would be misoptimization. – Basilevs Sep 11 '14 at 15:25
  • @Basilevs: I know, but lets say both are evenly cheap to calculate. `b0` and `b1` are for example method arguments. So no cost for that peace of code to calculate them. – Martijn Courteaux Sep 11 '14 at 15:29
  • 2
    "what the compiler will do" is extremely dependent on context. With optimisation enabled the compiler's need (or not) to prepare registers depends on what's gone before and the actual instructions emitted and the cycles each one will cost depends on the architecture. It must have taken you longer to type this question out than it would have taken to just execute `gcc -c -g -Wa,-ahl=test.s test.cpp` (or similar) and just eyeball the assembly listing. – Andy Brown Sep 11 '14 at 15:31
  • 2
    It is usually good practice to write what you want to express. Only deviate from that if your _measurements_ indicate that it is indeed a good idea to do so. – stefan Sep 11 '14 at 15:32
  • 1
    In your first example, `(b0 && b1)` will evaluate to two instructions on most processor: `ANDing b0 with b1`, and `branch if zero`. It is no more efficient than `(b0 & b1)`. Always look at the assembly language list, for the truth will be there. – Thomas Matthews Sep 11 '14 at 15:43
  • @Galik, in the expression `b0 & b1` the `bool` variables undergo integer promotion to `int` and it is required that `true` promotes to exactly `1`, even if the bit pattern of `true` is something else. – Jonathan Wakely Sep 11 '14 at 15:47
  • @Johanathan Okay, that surprises me. But I suppose that means that the compiler can't just substitute & for && without adding some conversion instructions too? (deleting my misleading comment) – Galik Sep 11 '14 at 15:58
  • 1
    @Galik, no conversion is needed if the compiler ensures that `true` is represented by the same bit pattern as `1`. If the compiler chooses to use `0xff` for `true` then yes, it would need to convert. Funnily enough most compilers don't choose to do that. – Jonathan Wakely Sep 11 '14 at 16:07

2 Answers2

3

If you care about what's fastest, why do you care what the compiler will do without optimisation?

Q: As I could have guessed, this is very depended on the situation, but is it a common trick for a compiler to replace a && with a &?

This question seems to assume that the compiler transforms C++ code into more C++ code. It doesn't. It transforms your code into machine instructions (including the assembler as part of the compiler for argument's sake). You should not assume there is a one-to-one mapping from a C++ operator like && or & to a particular instruction.

With optimisation the compiler will do whatever it thinks will be faster. If a single instruction would be faster the compiler will generate a single instruction for if (b0 && b1), you don't need to bugger up your code with micro-optimisations to help it make such a simple transformation.

The compiler knows the instruction set it's using, it knows the context the condition is in and whether it can be removed entirely as dead code, or moved elsewhere to help the pipeline, or simplified by constant propagation, etc. etc.

And if you really care about what's fastest, why would you compute b1 until you know it's actually needed? If obtaining the value of b1 has no side effects the compiler could even transform your code to:

bool b0 = ...;
if (b0)
{
  bool b1 = ...;
  if (b1)
  {

Does that mean two if conditions are faster than a &?! Of course not.

In other words, the whole premise of the question is flawed. Do not compromise the readability and simplicity of your code in the misguided pursuit of the "theoretically fastest" micro-optimisation. Spend your time improving the algorithms and data structures used not trying to second guess which instructions the compiler will generate.

Jonathan Wakely
  • 166,810
  • 27
  • 341
  • 521
  • *"Why would you compute b1 until you know it's actually needed?"* As I said in the comments: because b1 is given, for example, by the input of a file, or as method argument. No way it is possible to skip the evaluation of b1. I'm talking about that case. And btw: I don't have performance issues in my application, I'm just wondering. – Martijn Courteaux Sep 11 '14 at 22:33
3

Q: What will the compiler do if I turn on optimization flags (-O1, -O2, -O3, -Os and -Ofast).

Most likely nothing more to increase the optimization. As stated in my comments, you really can't optimize the evaluation any further than:

  AND B0 WITH B1 (sets condition flags)
  JUMP ZERO TO ...

Although, if you have a lot of simple boolean logic or data operations, some processors may conditionally execute them.

Will the compiler automatically compile it like &, even if I have used a && in the code?
And what is theoretically faster?

In most platforms, there is no difference in evaluation of A & B versus A && B.
In the final evaluation, either a compare or an AND instruction is executed, then a jump based on the status. Two instructions.

Most processors don't have Boolean registers. It's all numbers and bits.

Optimize By Boolean Logic

Your best option is to review the design and set up your algorithms to use Boolean algebra. You can than simplify the Boolean expressions.

Another option is to implement the code so that the compiler can generate conditional assembly instructions, if the platform supports them.

Optimize: Reduce jumps

Processors favor arithmetic and data transfers over jumps.

Many processors are always feeding an instruction pipeline. When it comes to a conditional branch instruction, the processor has to wait (suspend the instruction prefetching) until the condition status is determined. Then it can determine where the next instruction will be fetched.

If you can't remove the jumps, such as in a loop, make the ratio of data processing to jumping bigger in the data side. Search for "Loop Unrolling". Many compilers will perform this when optimization levels are increased.

Optimize: Data Cache

You may notice increased performance by organizing your data for best data cache usage.

For example, instead of 3 large arrays, use one array of a structure containing 3 elements. This allows the elements in use to be close to each other (and reduce the likelihood of accessing data outside of the cache).

Summary

The difference in evaluation of A && B versus A & B as conditional expressions is known as a micro-optimization. You will achieve improved performance by using Boolean algebra to reduce the quantity of conditional expressions. Jumps, or changes in execution path, slow down instruction execution. Fetching data outside of the data cache also slows down execution. You will most likely get better performance by redesigning your code and helping the compiler to reduce the branches and more effective use of the data cache.

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154
  • @JonathanWakely: Why would this be a silly question? I am by the time of writing only 19 years old, and haven't gotten any lesson in school about programming. I'm trying to understand better how compilers and processors work. That is not bad, is it? I'm asking questions now and then to get insight in this stuff. The question "is it a well known optimization strategy for compilers to generate code that evaluates the boolean logic long circuit, to avoid the extra TEST statement?" doesn't look like such a bad question to me. I'm sorry for willing to learn about the details... – Martijn Courteaux Sep 11 '14 at 22:09
  • @ThomasMatthews: So you are suggesting that compilers will most likely produce long circuit code, even if coded short circuit? (As long as that is a valid optimization and has no side-effects, of course). – Martijn Courteaux Sep 11 '14 at 22:29
  • Where does this term "long circuit code" come from and what does it mean? As for the silliness of the question, you prefixed your question with "I don't want to hear about premature optimization is the root of all evil, etc." so you already rejected the most useful answers. – Jonathan Wakely Sep 12 '14 at 08:43
  • @JonathanWakely: Long circuit evaluation of the condition in the if statement only does one TEST instruction and one AND instruction. The short circuit version would do two TEST instructions and no AND, because that is how it is how && is defined: short circuit. If the first operand turns out to be false, it shouldn't look at the second operand anymore. In order to achieve this behavior, you should have two TEST instructions. The long circuit version will evaluate all operands and join them together with AND instructions and finish off with only one TEST instruction. – Martijn Courteaux Sep 12 '14 at 13:37
  • Yes, I understand what "short circuiting" means, but "long circuiting" doesn't mean anything. In C++ a non-overloaded `&` _always_ means bitwise 'and', it is not a special boolean operator as in C#. Also, you're still making incorrect assumptions about how C++ operators map to instructions - there is **no** requirement that `&` maps to different instructions to `&&`, C++ is not just a high-level assembly language. – Jonathan Wakely Sep 12 '14 at 13:40
  • Oh, I thought that the name "long circuit" would be appropriate if "short circuit" is the other approach of evaluation one can use in programming. – Martijn Courteaux Sep 12 '14 at 13:45