1

When sending a patch to a widely known open source project (known for its performance and simplicity), I received a review that was a bit surprising to me: 'using "bool" type from C99 is a bad idea'. They reasoned it very well, and I was shown a simple example program that showed that (unoptimized code) clearly had more instructions when using bool than when using an integer type.

So they basically use something like typedef unsigned int bool_t;, and make sure they only assign 1 to that type.

I wanted to get a convincing and definitive answer to this, and also know what kind of performance difference are we talking about (i.e., is it worth it?), and see if compiler could do better with optimizations enabled.

There's a C++ question that is very related to this one, but (apart from being C++) that one restricts itself to the selection statement, while in this one I'm concerned about both aspects of bool: assignment and selection. That related question is Which is faster : if (bool) or if(int)?

So, what is faster, bool or an integer type? And how important is the performance difference?

  • 3
    `bool` (which in C is an alias for `_Bool`) *is* an integer type. But it does have semantics associated with it that other integer types do not have. – John Bollinger Dec 15 '21 at 17:06
  • 2
    "How important is the performance difference?" - it depends on what the code is doing. Does it happen once over the lifetime of the program, or thousands of times in a tight loop? If the former, it's not worth worrying about. If the latter, it can make a difference, but is the difference worth it? Correctness, clarity, and maintainability matter more than raw speed. Having said that, if they already have a convention for dealing with Boolean values with non-`bool` types, then use their convention. – John Bode Dec 15 '21 at 17:52
  • 2
    Also related: [Boolean values as 8 bit in compilers. Are operations on them inefficient?](https://stackoverflow.com/q/47243955) - there are some cases compilers aren't great at, but there's no general rule. – Peter Cordes Dec 16 '21 at 03:03

1 Answers1

4

EDITED 2021-12-16 19:07: Show comparison against both uint and uchar, and also show both GCC and Clang. Add -march=native to compiler flags. Now the results seem to show that bool is as good as other integer types, but some compilers produce sub-optimal code.

EDITED 2022-01-11 18:56: After some tests, slightly changing the code can show important performance problems, more likely to be present with _Bool than uint.

For my tests, I chose unsigned types, since that's what the project was using instead of bool, but I expect signed types to behave similarly.

I'll show here the tests with unsigned char, since bool is 1 byte in my system and that reduces the difference in assembly output, and also unsigned int to compare different widths.

I tested storing an integer into one of these types (bool, unsigned char, and unsigned int), using one of these types to control a selection statement, and using one of this types as a parameter of a function.


Source code:

// repeat.h:

#pragma once

#define repeat2(e)     (e);(e)
#define repeat4(e)     repeat2(e);repeat2(e)
#define repeat8(e)     repeat4(e);repeat4(e)
#define repeat16(e)    repeat8(e);repeat8(e)
#define repeat32(e)    repeat16(e);repeat16(e)
#define repeat64(e)    repeat32(e);repeat32(e)
#define repeat128(e)   repeat64(e);repeat64(e)
#define repeat256(e)   repeat128(e);repeat128(e)
#define repeat512(e)   repeat256(e);repeat256(e)
#define repeat1024(e)  repeat512(e);repeat512(e)

#define repeat(e)  do                           \
{                                   \
    repeat16(e);                            \
} while (0)

// store_bool.h:

#pragma once

_Bool store_bool(long n, int x);

// store_bool.c:

#include "store_bool.h"
#include "repeat.h"

_Bool store_bool(long n, volatile int x)
{
    volatile _Bool  b;

    for (long i = 0; i < n; i++)
        repeat(b = x);
    return b;
}

// store_uchar.h:

#pragma once

unsigned char store_uchar(long n, int x);

// store_uchar.c:

#include "store_uchar.h"
#include "repeat.h"

unsigned char store_uchar(long n, volatile int x)
{
    volatile unsigned char  c;

    for (long i = 0; i < n; i++)
        repeat(c = x);
    return c;
}

// store_uint.h:

#pragma once

unsigned int store_uint(long n, int x);

// store_uint.c:

#include "store_uint.h"
#include "repeat.h"

unsigned int store_uint(long n, volatile int x)
{
    volatile unsigned int  u;

    for (long i = 0; i < n; i++)
        repeat(u = x);
    return u;
}

// consume_bool.h:

#pragma once

int consume_bool(long n, _Bool b);

// consume_bool.c:

#include "consume_bool.h"
#include "repeat.h"

int consume_bool(long n, volatile _Bool b)
{
    volatile int  x = 5;

    for (long i = 0; i < n; i++)
        repeat({if (b) x = 3;});
    return x;
}

// consume_uchar.h:

#pragma once

int consume_uchar(long n, unsigned char u);

// consume_uchar.c:

#include "consume_uchar.h"
#include "repeat.h"

int consume_uchar(long n, volatile unsigned char c)
{
    volatile int  x = 5;

    for (long i = 0; i < n; i++)
        repeat({if (c) x = 3;});
    return x;
}

// consume_uint.h:

#pragma once

int consume_uint(long n, unsigned int u);

// consume_uint.c:

#include "consume_uint.h"
#include "repeat.h"

int consume_uint(long n, volatile unsigned int u)
{
    volatile int  x = 5;

    for (long i = 0; i < n; i++)
        repeat({if (u) x = 3;});
    return x;
}

// param_bool_.h:

#pragma once

int param_bool_(_Bool x);

// param_bool_.c:

#include "param_bool_.h"

int param_bool_(_Bool b)
{
    return b ? 3 : 5;
}

// param_bool.h:

#pragma once

void param_bool(long n, _Bool b);

// param_bool.c:

#include "param_bool.h"
#include "param_bool_.h"
#include "repeat.h"

void param_bool(long n, volatile _Bool b)
{
    for (long i = 0; i < n; i++)
        repeat(param_bool_(b));
}

// param_uchar_.h:

#pragma once

int param_uchar_(unsigned char c);

// param_uchar_.c:

#include "param_uchar_.h"

int param_uchar_(unsigned char c)
{
    return c ? 3 : 5;
}

// param_uchar.h:

#pragma once

void param_uchar(long n, unsigned char c);

// param_uchar.c:

#include "param_uchar.h"
#include "param_uchar_.h"
#include "repeat.h"

void param_uchar(long n, volatile unsigned char c)
{
    for (long i = 0; i < n; i++)
        repeat(param_bool_(c));
}

// param_uint_.h:

#pragma once

int param_uint_(unsigned int u);

// param_uint_.c:

#include "param_uint_.h"

int param_uint_(unsigned int u)
{
    return u ? 3 : 5;
}

// param_uint.h:

#pragma once

void param_uint(long n, unsigned int u);

// param_uint.c:

#include "param_uint.h"
#include "param_uint_.h"
#include "repeat.h"

void param_uint(long n, volatile unsigned int u)
{
    for (long i = 0; i < n; i++)
        repeat(param_bool_(u));
}

// main.c:

#include <stdio.h>
#include <time.h>

#include "store_bool.h"
#include "store_uchar.h"
#include "store_uint.h"
#include "consume_bool.h"
#include "consume_uchar.h"
#include "consume_uint.h"
#include "param_bool.h"
#include "param_uchar.h"
#include "param_uint.h"


#define measure(e)                          \
({                                  \
    clock_t  t0, t1;                        \
    double   t;                         \
                                    \
    t0 = clock();                           \
    e;                              \
    t1 = clock();                           \
                                    \
    t = (double) (t1 - t0) / CLOCKS_PER_SEC;            \
    t;                              \
})


int main(int argc, char *argv[])
{
    double  sb, sc, su;
    double  cb, cc, cu;
    double  pb, pc, pu;
    long    n;

    if (argc != 2)
        exit(2);
    n = atol(argv[1]);

    sb = measure(store_bool(n, 1));
    sc = measure(store_uchar(n, 1));
    su = measure(store_uint(n, 1));

    cb = measure(consume_bool(n, 1));
    cc = measure(consume_uchar(n, 1));
    cu = measure(consume_uint(n, 1));

    pb = measure(param_bool(n, 1));
    pc = measure(param_uchar(n, 1));
    pu = measure(param_uint(n, 1));

    printf("n: %li\n", n);
    putchar('\n');
    printf("store bool:    %lf\n", sb);
    printf("store uchar:   %lf\n", sc);
    printf("store uint:    %lf\n", su);
    putchar('\n');
    printf("consume bool:  %lf\n", cb);
    printf("consume uchar: %lf\n", cc);
    printf("consume uint:  %lf\n", cu);
    putchar('\n');
    printf("param bool:    %lf\n", pb);
    printf("param uchar:   %lf\n", pc);
    printf("param uint:    %lf\n", pu);
}

I used volatile for some variables, to avoid the compiler optimizing out the multiple assignments and tests.

Since the compiler will not unroll the loops, as they are huge, I used many (16) repeated expressions in each loop (see the repeat() macro), to reduce the impact of the loop overhead (jump instructions) in the total benchmark time.


Compiling:

$ cc -Wall -Wextra -O3 -march=native -S *.c
$ cc -O3 -march=native *.s
$

Assembly:

I'll cherry-pick a single one of the 16 repetitions, to simplify. If you want to see full assembly files, you can compile them yourself (I gave here enough instructions).

// store_bool.s (GCC):

    movl    -20(%rsp), %edx
    testl   %edx, %edx
    setne   %dl
    movb    %dl, -1(%rsp)

// store_bool.s (Clang):

    cmpl    $0, -4(%rsp)
    setne   -5(%rsp)

// sotre_uchar.s (GCC):

    movl    -20(%rsp), %edx
    movb    %dl, -1(%rsp)

// store_uchar.s (Clang):

    movl    -4(%rsp), %ecx
    movb    %cl, -5(%rsp)

// store_uint.s (GCC):

    movl    -20(%rsp), %edx
    movl    %edx, -4(%rsp)

// store_uint.s (Clang):

    movl    -4(%rsp), %ecx
    movl    %ecx, -8(%rsp)

From the above, uchar and uint are likely to be the same. bool has two instructions too on Clang, but they are different; that may or may not make a difference. On GCC, it clearly has 2 extra instructions compared to uchar which makes it slower.

// consume_bool.s (GCC):

    movzbl  -20(%rsp), %edx
    testb   %dl, %dl
    je  .L2
    movl    $3, -4(%rsp)
.L2:

// consume_bool.s (Clang):

.LBB0_5:                                #   in Loop: Header=BB0_1 Depth=1
    testb   $1, -5(%rsp)
    jne .LBB0_6

    [...]

.LBB0_6:                                #   in Loop: Header=BB0_1 Depth=1
    movl    $3, -4(%rsp)
    testb   $1, -5(%rsp)
    je  .LBB0_9

(LBB0_9 is similar to LBB0_5)

// consume_uchar.s (GCC):

    movzbl  -20(%rsp), %edx
    testb   %dl, %dl
    je  .L2
    movl    $3, -4(%rsp)
.L2:

// consume_uchar.s (Clang):

    cmpb    $0, -5(%rsp)
    je  .LBB0_3
# %bb.2:                                #   in Loop: Header=BB0_1 Depth=1
    movl    $3, -4(%rsp)
.LBB0_3:                                #   in Loop: Header=BB0_1 Depth=1

// consume_uint.s (GCC):

    movl    -20(%rsp), %edx
    testl   %edx, %edx
    je  .L2
    movl    $3, -4(%rsp)
.L2:

// consume_uint.s (Clang):

    cmpl    $0, -4(%rsp)
    je  .LBB0_3
# %bb.2:                                #   in Loop: Header=BB0_1 Depth=1
    movl    $3, -8(%rsp)
.LBB0_3:                                #   in Loop: Header=BB0_1 Depth=1

In these cases, the assembly produced by GCC is almost identical for the 3 types, so I don't expect any difference. In Clang, bool has different code, but since it's very different, it's hard to predict if it will be faster or slower than the integers.

// param_bool_.s (GCC):

param_bool_:
.LFB0:
    .cfi_startproc
    cmpb    $1, %dil
    sbbl    %eax, %eax
    andl    $2, %eax
    addl    $3, %eax
    ret
    .cfi_endproc
.LFE0:

// param_bool_.s (Clang):

param_bool_:                            # @param_bool_
    .cfi_startproc
# %bb.0:
    xorb    $1, %dil
    movzbl  %dil, %eax
    addl    %eax, %eax
    addl    $3, %eax
    retq
.Lfunc_end0:

// param_bool.s (GCC):

    movzbl  12(%rsp), %edi
    call    param_bool_@PLT

// param_bool.s (Clang):

    movzbl  15(%rsp), %edi
    andl    $1, %edi
    callq   param_bool_

// param_uchar_.s (GCC):

param_uchar_:
.LFB0:
    .cfi_startproc
    cmpb    $1, %dil
    sbbl    %eax, %eax
    andl    $2, %eax
    addl    $3, %eax
    ret
    .cfi_endproc
.LFE0:

// param_uchar_.s (Clang):

param_uchar_:                           # @param_uchar_
    .cfi_startproc
# %bb.0:
    xorl    %eax, %eax
    testl   %edi, %edi
    sete    %al
    addl    %eax, %eax
    addl    $3, %eax
    retq
.Lfunc_end0:

// param_uchar.s (GCC):

    movzbl  12(%rsp), %edi
    call    param_uchar_@PLT

// param_uchar.s (Clang):

    movzbl  15(%rsp), %edi
    callq   param_uchar_

// param_uint_.s (GCC):

param_uint_:
.LFB0:
    .cfi_startproc
    cmpl    $1, %edi
    sbbl    %eax, %eax
    andl    $2, %eax
    addl    $3, %eax
    ret
    .cfi_endproc
.LFE0:

// param_uint_.s (Clang):

param_uint_:                            # @param_uint_
    .cfi_startproc
# %bb.0:
    xorl    %eax, %eax
    testl   %edi, %edi
    sete    %al
    addl    %eax, %eax
    addl    $3, %eax
    retq
.Lfunc_end0:

// param_uint.s (GCC):

    movl    12(%rsp), %edi
    call    param_uint_@PLT

// param_uint.s (Clang):

    movl    12(%rsp), %edi
    callq   param_uint_

In this case, bool should be the same as uchar since the only important thing should be the width, and we might see (or not) a difference with uint. A part from zero extending, there's not much difference. There are slight differences between GCC and Clang, however, Clang producing larger code, so I expect Clang to run slightly slower than GCC.


Timing:

// amd64, gcc-11, i5-5675C:

$ ./a.out 1073741824
store bool:    4.928789
store uchar:   4.795028
store uint:    4.803893

consume bool:  4.795776
consume uchar: 4.794873
consume uint:  4.794079

param bool:    17.713958
param uchar:   17.611229
param uint:    17.688909

// amd64, clang-13, i5-5675C:

$ ./a.out 1073741824
store bool:    4.806418
store uchar:   4.802943
store uint:    4.800172

consume bool:  4.805537
consume uchar: 4.799858
consume uint:  4.799462

param bool:    19.095543
param uchar:   17.708014
param uint:    17.782490

In 'store', as we expected, bool is slower than the other types with GCC (around 1~10%). With Clang, there's no significant difference (I've seen bool consistently be a bit slower than the others, but less than 0.5%).

In 'consume', we see no difference between types or compilers.

In 'param', times vary a lot between runs, and there's no consistency: sometimes bool is slower, and sometimes it's faster. However, GCC is consistently faster than Clang.


Slight changes in the code can lead to compilers missing important optimizations. Using the following code in consume_<type>.c, leads to some important performance loss:

        repeat(x = b ? 3 : x);

Note that just by changing an if to a ternary operator, makes the compiler slow down to the following times:

GCC:

$ ./a.out 1073741824
n: 1073741824

...

consume bool:  8.684662
consume uchar: 8.683915
consume uint:  8.086806

...

Clang:

$ ./a.out 1073741824
n: 1073741824

...

consume bool:  8.161896
consume uchar: 5.422896
consume uint:  5.127165

...

Clang slows down considerably for _Bool, while maintaining a reasonable speed for other types. GCC seems to generate quite bad code for all the types.


Conclusion:

Programmers should consider a few things:

Performance: Even though _Bool may be theoretically as fast as unsigned int, compilers are far from being ideal, and it is likely that your compiler will miss some optimizations, which in some cases may be quite important.

Maintainability/readability/correctness: Some may argue that _Bool is safer due to autonormalization; others may argue that it is less safe due to autonormalization; just know what you're using, and form your own opinion.

Supporting pre-C99 code: If that's the case, you have no choice but using unsigned int.

  • 5
    I wonder how much value there is in the measurements using `volatile`. The real code that is _not_ using `volatile` is likely going to look very different. – Ted Lyngmo Dec 15 '21 at 16:12
  • 1
    I agree with Ted, this seems like something I suspect is more cargo cult than reality as the requirements for `_Bool` are pretty lenient and favor performance. The only real requirement is that from an abstract machine perspective it only holds 1 or 0. The compiler is allowed to do a lot of "AS-IF" with them. – Mgetz Dec 15 '21 at 16:14
  • All of this is a necessary consequence of how `bool` is defined. It can *only* take a value of `1` or `0`, so if you assign a non-`bool` thing to a `bool`, it has to normalize it (by testing it for non-zeroness, which is what the `testl` does, with the `setne` extracting the flag bit `testl` sets). When testing it though, there's no advantage to comparing to `1` or `0` explicitly (`cmp REG,CONSTANT` is slower than `test REG,REG`), so it just uses the same `test` instruction it used to normalize in the first place. – ShadowRanger Dec 15 '21 at 16:14
  • I forgot to post the full assembly files. Will do. In principle, volatile avoids optimizations due to known previous values, but allows other optimizations, doesn't it? – alx - recommends codidact Dec 15 '21 at 16:15
  • 4
    Your question says they use `typedef unsigned int bool_t;` and make sure to only assign `1` or `0` to them, but by definition this means they're manually writing the same code that `bool` was generating for them; using `bool_t b = somenonboolinteger != 0;` will end up producing that same `testl` + `setne` anyway. And using a `typedef` for `unsigned int` as in the question (vs. the `unsigned char` in your answer) means all your `bool`s likely occupy 4x the memory on most systems (32x the memory for `std::vector` vs. `std::vector`, but `std::vector` has perf issues). – ShadowRanger Dec 15 '21 at 16:16
  • 2
    You should not assign a non bool value to a bool anyway if you want clear code. You always end up assigning the result a comparison (like `step == 0` or `pass < 5`) which _do_ return a boolean already. So in practice there is no assignment overhead. – koder Dec 15 '21 at 16:19
  • @TedLyngmo: From the assembly diff posted, it looks like `volatile` isn't messing with stuff too much (just ensuring it actually runs instead of discarding unused assignments). I'd wondered that myself (MSVC will insert memory barriers for some uses of `volatile` IIRC, which would *swamp* any other performance differences), but it looks okay in this case. – ShadowRanger Dec 15 '21 at 16:19
  • @koder: Yeah, the unclear code just performs the `!= 0` comparison implicitly, where the clear code performs it explicitly. – ShadowRanger Dec 15 '21 at 16:20
  • Does `volatile` have any effect in a parameter list? – Gerhardh Dec 15 '21 at 16:25
  • @ShadowRanger re: 4x memory: They actually use different integer types in `struct`s for alignment reasons (think `bool8_t`, `bool16_t`, ...) so they don't have that problem. And I guess that's what causes them to have the problem of assigning an integer which they know to be 1 or 0, but the compiler doesn't. A solution would be to have different bool sizes in C, with would pass the information to the compiler. – alx - recommends codidact Dec 15 '21 at 16:29
  • 1
    Oh, just realized something else: Your `store_uint` code is *wrong*. You assign an `int` of varying value to an `unsigned char`. But the truncation there doesn't normalize it, so 1) It violates your group's rule about only assigning `1` or `0` to the `typedef`, and 2) If the `int` value happens to be a multiple of 256 (possibly 128 + a multiple of 256; might be undefined/implementation defined behavior), it'll be treated as false, even though the `int` is non-zero. Fixing that would change the assignment to `u = x != 0;`, and your assembly would end up the same as the `bool` case. – ShadowRanger Dec 15 '21 at 16:30
  • @ShadowRanger, yeah, just assume that int comes from another `bool32_t` for example, which we know to be 1 or 0, but the compiler doesn't. I omitted (forgot) some of those details here. I guess that `u = !!x` is exactly the same than using bool, as you say, since there's an explicit cast to bool. – alx - recommends codidact Dec 15 '21 at 16:32
  • 1
    @alx: Yeah, I'm not sure a mere 1-5% boost in a scenario *designed* to hammer `bool` auto-normalization is meaningful. Real code would typically not need to normalize much if they used a single `bool` type consistently (`bool` to `bool` assignment doesn't need to normalize after all, and anything that's not a `bool` needs to be normalized the vast majority of the time anyway (you say they know it's `1`/`0`, but I've seen such surety go wrong *many* times in real code; if they *knew* it was `1` or `0` it would be a `bool` already), and compiler or programmer normalization costs the same. – ShadowRanger Dec 15 '21 at 16:39
  • @ShadowRanger Yup, however, in their code base, they use a lot of different `struct`s, where they store boolean variables, and they use different boolean types for those to tune the performance of those structures. Since most of their booleans are in structures, most of their bool code needs differently-sized integer-type booleans. And then, when you make a temporary local boolean variable to store the value of that fake boolean, you need unsigned int to get the best performance. I guess in their case it can be the right thing to do. And yes, most programs shouldn't do this. – alx - recommends codidact Dec 15 '21 at 16:43
  • 4
    Even if some auto-normalizations are "unnecessary", the percentage of them in real world code would be *well* under 1% of all operations (where the benchmark makes them ~50% of all operations), so that 1-5% change in a microbenchmark would translate to *well* under 0.02-0.1% change in any real world code. Is that microoptimization really worth the risk of silently getting things wrong (but for only 1 in 256 values, or even less for `short` and larger based `bool_t`s, so it happens incredibly rarely, creating hard to repro bugs) in the cases where the normalization is omitted? – ShadowRanger Dec 15 '21 at 16:43
  • 1
    @alx: Temporary local booleans will, when not volatile and optimizations turned on, usually get expanded and drop normalization when the compiler can get away with it (and deems it likely to improve performance), so no savings there. In the `struct`s, there's still no need to tune performance; if you use 8-bit `bool`, the `struct` pads to align the next member anyway, so replacing it with `bool32_t` doesn't change memory usage or performance. – ShadowRanger Dec 15 '21 at 16:46
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/240178/discussion-between-shadowranger-and-alx). – ShadowRanger Dec 15 '21 at 16:46
  • @Gerhardh re: *"Does volatile have any effect in a parameter list?"*: Yes, it does. It is just a local variable. A very real scenario would be if the function called [`setjmp(3)`](https://man7.org/linux/man-pages/man3/setjmp.3.html); in that case, a jump might bring you to a point where the compiler expected the local to have a distinct value, and only `volatile` will prevent incorrect optimization. See and the manual page I linked. However, it doesn't have any effect in the prototype; the same as `const`. – alx - recommends codidact Dec 15 '21 at 20:57
  • @alx interesting. Thank you! – Gerhardh Dec 15 '21 at 22:12
  • @ShadowRanger: Agreed, forcing bools to be materialized in integer regs or memory is bad, instead of just as FLAGS conditions. Although there are *some* cases where compilers have missed optimizations with actual `bool`, where you could do better with integer types for operations between multiple `bool` values that *do* exist as integers. [Boolean values as 8 bit in compilers. Are operations on them inefficient?](https://stackoverflow.com/q/47243955) – Peter Cordes Dec 16 '21 at 03:02