Is this a clang optimizer bug or an undefined behavior in C?

Question

This code gives different results for -O1 and -O2:

/*
    Example of a clang optimization bug.
    Mark Adler, August 8, 2015.

    Using -O0 or -O1 takes a little while and gives the correct result:

        47 bits set (4294967296 loops)

    Using -O2 or -O3 optimizes out the loop, returning immediately with:

        0 bits set (4294967296 loops)

    Of course, there weren't really that many loops.  The number of loops was
    calculated, correctly, by the compiler when optimizing.  But it got the
    number of bits set wrong.

    This is with:

        Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
        Target: x86_64-apple-darwin14.4.0

 */

#include <stdio.h>
#include <inttypes.h>

/* bit vector of 1<<32 bits, initialized to all zeros */
static uint64_t vec[1 << 26] = {0};

int main(void)
{
    /* set 47 of the bits. */
    vec[31415927] = UINT64_C(0xb9fe2f2fedf7ebbd);

    /* count the set bits */
    uint64_t count = 0;
    uint64_t loops = 0;
    uint32_t x = 0;
    do {
        if (vec[x >> 6] & ((uint64_t)1 << (x & 0x3f)))
            count++;
        x++;
        loops++;
    } while (x);
    printf("%" PRIu64 " bits set (%" PRIu64 " loops)\n", count, loops);
    return 0;
}

So is this a bug? Or is there somehow an undefined behavior in there, that the compiler is within its rights to give different results for?

As far as I can tell from the C99 standard, the do loop to go through all uint32_t values is valid since the increment of the largest unsigned integer value is well defined to result in zero.

A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.

`0xb9fe2f2fedf7ebbd` is too large to be an `int`, you should add `ULL`. — mch, Aug 08 '15 at 19:40
Can't reproduce either with gcc-5.1.1 or clang3.6.0 on my 32 bit system. — P.P, Aug 08 '15 at 19:45
If you look at the assembly (clang 3.4.1, x86_64, -O3), the loop is truly gone, i.e. no code is generated for anything between `do {` and `} while (x)`. Goes away when you set `uint32_t x = 1`. Also: shouldn't OSX be LP64, i.e. shouldn't `%lu` be enough (it shouldn't hurt, though)? — dhke, Aug 08 '15 at 19:49
@mch: The `ULL` suffix is not necessary. The type of a hexadecimal integer constant is of the first of `int`, `unsigned int`, `long int`, `unsigned long int`, `long long int`, `unsigned long long int` in which its value can be represented. `0xb9fe2f2fedf7ebbd` is of type `unsigned long int` on a system with 64-bit `long`, or `unsigned long long int` on a system with 32-bit `long`. — Keith Thompson, Aug 08 '15 at 19:50
gcc gives me a warning on the `printf`: "warning: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 2 has type ‘uint64_t’" Apparently `uint64_t` is `unsigned long int` on my system. (The warning goes away if I compile with `-m32`.) This shouldn't cause any actual problems, but casting to `unsigned long long` would avoid the warning. — Keith Thompson, Aug 08 '15 at 19:53
What difference you see between the generated code? The only issue I see is the printf formats for `uint64_t`. Use: `printf("%" PRIu64 "%" PRIu64 "\n", count, loops);` (might not make any difference if if uint_64t is same ull). — P.P, Aug 08 '15 at 19:53
No, @mch is not on to something. @KeithThompson is correct. Just to verify, neither `ULL` nor `UINT64_C()` makes any difference. Note that all 47 set bits in the long hexadecimal constant are all counted correctly in the less/non-optimized case. — Mark Adler, Aug 08 '15 at 20:03
No, `PRIu64` makes no difference either. Especially seeing as how it is defined as "llu". — Mark Adler, Aug 08 '15 at 20:06
@KeithThompson: interestingly clang on my system gives a warning the other way around. If I use `%lu` I get a warning about the type, but if I use `%llu` I don't. Go figure. In any case, the formatting has nothing to do with the bug. The `PRIu64` suggestion (from `inttypes.h`) should avoid the warning portably. — Mark Adler, Aug 08 '15 at 20:09
`uint64_t` is (probably) a typedef for `unsigned long` on 64-bit systems, and for `unsigned long long` on 32-bit systems. Presumably `unsigned long` is 32 bits on your system -- and `PRIu64` should be defined as `"lu"` on my system. That explains why we see differing behavior. Either casting to `PRIu64` or casting to `unsigned long long` (not both!) addresses the issue; I used the latter approach in my answer. (I tend to avoid the `PRI*` macros for no better reason than that I find them ugly and difficult to remember.) (Or `uint64_t` could be `unsigned long long` on a 64-bit system.) — Keith Thompson, Aug 08 '15 at 20:29
No, `unsigned long` is 64 bits on my system, as is `unsigned long long`. In that case, it is apparently an arbitrary choice as to which of those you would define `uint64_t` to be. It is the latter on my system, and the former on yours. — Mark Adler, Aug 08 '15 at 20:33
@MarkAdler: I tmight still generate a warning due to different ranks of `long` and `long long`. That is independent of the actual size. Anyway. using `PRIu64` for `uint64_t` is the standard way. It should, however not be the actual problem if both are 64 bits. — too honest for this site, Aug 08 '15 at 20:51
@Olaf: yes, that's how this all started about a hundred comments up from here. `printf` generates a warning if the format references the wrong 64-bit type, long vs. long long, even though both are 64 bits. — Mark Adler, Aug 08 '15 at 20:55
@MarkAdler: Interesting. A simple way to confirm this is: `echo '#include ' | clang -E - | grep uint64_t`. Replace `clang` by `gcc` and/or add `-m32` or `-m64` for more information. — Keith Thompson, Aug 08 '15 at 23:24
Yes, the `uint64_t` definition is `typedef unsigned long long uint64_t;`. This is not changed by `-m32` or `-m64`. I do not have a real gcc. ("gcc" is redirected to clang.) — Mark Adler, Aug 08 '15 at 23:40

score 32 · Accepted Answer · edited May 23 '17 at 12:24

I'm reasonably sure this is a bug in clang. I see no undefined behavior in the program (assuming that it doesn't exceed the implementation's capacity limits) -- other than a small problem in the printf call that I'll address below (and that has now been addressed in an edit to the question). It's possible that I've missed something, but I don't think so.

If I have missed something, I expect it to be pointed out shortly. If this answer remains uncontradicted after a few days, I'll take it as a strong indication that it is indeed a bug in clang.

UPDATE : Mark Adler, the original poster, has reported this and confirmed that it's a bug in pre-3.6.0 clang, corrected in later versions. I will shamelessly steal this link to the bug report from his answer.

The correct output is:

47 bits set (4294967296 loops)

To address some of the things that have been pointed out (or that I've noticed myself):

static uint64_t vec[1 << 26] = {0};

This is a large object (2²⁹ bytes, or half a gigabyte, assuming CHAR_BIT==8), but it apparently does not exceed the implementation's capacity. If it did, it would be rejected. I'm not 100% sure the standard requires this, but since the program does work correctly at lower optimization levels, we can assume that the object is not too large.

vec[31415927] = 0xb9fe2f2fedf7ebbd

The constant 0xb9fe2f2fedf7ebbd is not a problem. Its value is between 2⁶³ and 2⁶⁴, so it's within the range of uint64_t. The type of a hexadecimal integer constant is wide enough to hold its value (unless it exceeds ULLONG_MAX, but that's not the case here).

if (vec[x >> 6] & ((uint64_t)1 << (x & 0x3f)))

I briefly thought that the left shift might be a problem, but it isn't. The left operand is of type uint64_t, and the right operand is in the range 0 .. 63. A left shift of 64 bits would have undefined behavior, but that's not the case here.

printf("%llu bits set (%llu loops)\n", count, loops);

The following has been addressed by an update to the question. I've tried the updated version of the code and I got the same results.

%llu requires an argument of type unsigned long long; count and loops are of type uint64_t. Here, depending on the implementation, we might have undefined behavior (on my system uint64_t is a typedef for unsigned long, and I get a warning). But it's not likely to cause any actual problems (unsigned long long and uint64_t typically have the same representation even if they're not the same type), and when I add casts to avoid any UB:

printf("%llu bits set (%llu loops)\n",
       (unsigned long long)count,
       (unsigned long long)loops);

I get the same behavior. The following results are for a program with casts added to the printf call.

Using gcc 5.2.0 on my 64-bit system, I get the correct output with -O0, -O1, -O2, and -O3, with or without -m32. The timing indicates that gcc does not eliminate the loop at any optimization level.

Using clang 3.4 on the same system, I get the correct output with -O0 or -O1, but incorrect output (0 bits set) at -O2 or -O3. Timing indicates that the loop is eliminated at -O2 and -O3. When I compile with clang -m32, the output is correct (and the loop is not eliminated) at all optimization levels.

When I change the declaration of loops to

volatile uint64_t loops = 0;

I get correct output at all optimization levels (and the loop is not eliminated).

A further tweak to the program (not shown here) shows that vec[31415927] really is set to 0xb9fe2f2fedf7ebbd, even when optimization produces the wrong bit count.

Thanks. By the way, I updated the printf, and added the unnecessary constant sizing in the question. — Mark Adler, Aug 08 '15 at 20:31
As a note: I cannot reproduce the problem with a more current clang (3.8.0 snapshot 20150720). Generated code seems fine for -O0, -O1, -O2, and -O3. — dhke, Aug 08 '15 at 21:37

score 17 · Answer 2 · answered Aug 08 '15 at 23:18

17

It is a bug in pre-3.6.0 clang. (The "3.6.0svn" version precedes 3.6.0.) Since it has already been fixed in the 3.6.0 release as of five months ago, I have reported the bug to Apple -- this is still their most recent distribution of compiler tools.

answered Aug 08 '15 at 23:18

Mark Adler

101,978
13
118
158

score 6 · Answer 3 · answered Aug 08 '15 at 21:46

It does look like a bug in clang. I can reproduce this in my 64-bit system running clang3.4-1ubuntu3; as the other answer mentions, I always get correct output with gcc (which never optimizes away the loop), but clang seems to optimize away the loop if we use -O2 and -O3.

This answer doesn't add much to Keith's thorough and outstanding answer, but for future reference I want to show a possible workaround (other than volatile).

Indeed, making either of x, count or loops volatile will fix it, but after some experimenting, I determined that the bug appears to manifest itself only on do { ... } while; loops.

If you change the code to use a while or a for loop (and make the appropriate changes to maintain the program's behavior), clang will always produce the correct output and the loop is not optimized away (but it still runs faster with -O3).

Here's an example:

#include <stdio.h>
#include <inttypes.h>

/* bit vector of 1<<32 bits, initialized to all zeros */
static uint64_t vec[1 << 26] = {0};

int main(void)
{
    /* set 47 of the bits. */
    vec[31415927] = UINT64_C(0xb9fe2f2fedf7ebbd);

    /* count the set bits */
    uint64_t count = vec[0] & (uint64_t)1;
    uint64_t loops = 1;
    uint32_t x = 1;

    while (x) {
        if (vec[x >> 6] & ((uint64_t)1 << (x & 0x3f)))
            count++;
        x++;
        loops++;
    }

    printf("%" PRIu64 " bits set (%" PRIu64 " loops)\n", count, loops);
    return 0;
}

Is this a clang optimizer bug or an undefined behavior in C?

3 Answers3