6

I was reading about undefined behavior, and I'm not sure if it's a compile-time only feature, or if it can occurs at execution-time.

I understand this example well (this is extracted from the Undefined Behavior page of Wikipedia):

An example for the C language:

int foo(unsigned x)
{
    int value = 5;
    value += x;
    if (value < 5)
        bar();
    return value;
}

The value of x cannot be negative and, given that signed integer overflow is undefined behavior in C, the compiler can assume that at the line of the if check value >= 5. Thus the if and the call to the function bar can be ignored by the compiler since the if has no side effects and its condition will never be satisfied. The code above is therefore semantically equivalent to:

int foo(unsigned x)
{
     int value = 5;
     value += x;
     return value;
}

But this occurs at compilation-time.

What if I write, for example:

void foo(int x) {
    if (x + 150 < 5)
         bar();
}

int main() {
    int x;
    std::cin >> x;
    foo(x);
}

and then the user type in MAX_INT - 100 ("2147483547", if 32 bits-integer).

There will be an integer overflow, but AFAIK, it is the arithmetic logic unit of the CPU that will make an overflow, so the compiler is not involved here.

Is it still undefined behavior?

If yes, how does the compiler detect the overflow?

The best I could imagine is with the overflow flag of the CPU. If this is the case, does it means that the compiler can do anything he wants if the overflow flag of the CPU is set anytime at execution-time?

Jules Lamur
  • 2,078
  • 1
  • 15
  • 25
  • Yes, it's undefined behavior if you overflow a signed integer. – Captain Obvlious Jan 09 '17 at 23:51
  • But it's the processor that overflow the integer here. I'm right? How is the compiler involved here? – Jules Lamur Jan 09 '17 at 23:52
  • 4
    The compiler is involved because the compiler adheres to a standard which states overflowing an integer is undefined behavior. Nothing to do with the CPU. – Captain Obvlious Jan 09 '17 at 23:53
  • 4
    Wikipedia is wrong. There is no signed addition in that code, hence no signed overflow, only potential unsigned wraparound followed by an unsigned-to-signed conversion (with implementation-defined behaviour). –  Jan 09 '17 at 23:56
  • 4
    Have a [real example of signed overflow UB (and a potential result)](http://coliru.stacked-crooked.com/a/a7f7ba72c4f268e8), because I don't get to post it as often I'd like. – jaggedSpire Jan 09 '17 at 23:58
  • @jaggedSpire That is soo cool! – Fantastic Mr Fox Jan 10 '17 at 00:06
  • @CaptainObvlious I wrote my example too fast, I understand now why it is a UB. Please take a look at my edited question. – Jules Lamur Jan 10 '17 at 00:06
  • 1
    For the second - the compiler is allowed to optimize `foo` to `if (x < -145) bar();` since it has the same effect in every case where there is no undefined behavior. (Though `gcc -fwrapv` for instance will produce code which behaves differently if fed `MAX_INT - 100` for `x`.) – Daniel Schepler Jan 10 '17 at 00:36
  • @hvd Yes, you right. I noticed the same problem. It appeared as if OP modified the quote, but then I noticed wiki was edited today. – 2501 Jan 10 '17 at 11:12
  • 1
    @2501 Blech. The new version is even more wrong. Of course signed integers can be negative, claiming they cannot be is absolute nonsense. –  Jan 10 '17 at 12:24
  • @hvd If you look at the wiki history, you'll see that `unsigned` was changed to `int`, and nothing else was modified. Thus the explanations around the code doesn't make sense anymore. A very poor edit. – 2501 Jan 10 '17 at 14:34
  • @2501 I've now edited it to use `unsigned char`, so that `x` is guaranteed to be non-negative again, and increased `value`'s initial value so that overflow remains possible on typical implementations. This should demonstrate the point the article is making. The whole thing is perhaps still not exactly good or useful, I worry, but at least it's no longer outright wrong. –  Jan 10 '17 at 17:57
  • @hvd You could have used INT_MAX. – 2501 Jan 11 '17 at 10:05
  • @2501 True. I'm making assumptions on the type sizes anyway though. If `unsigned char` has greater range than `int`, the addition happens in `unsigned int` again. And including a note of that in the text probably distracts too much. –  Jan 11 '17 at 11:23
  • @hvd Yes. I had to re-check if the Standard allows UCHAR_MAX > INT_MAX, or even > INTMAX_MAX, and it does. – 2501 Jan 11 '17 at 11:37

2 Answers2

6

Yes but not necessarily in the way I think you might have meant it, that is, if in the machine code there is an addition and at runtime that addition wraps (or otherwise overflows, but on most architectures it would wrap) that is not UB by itself. The UB is solely in the domain of C (or C++). That addition may have been adding unsigned integers or be some sort of optimizations that the compiler can make because it knows the semantics of the target platform and can safely use optimizations that rely on wrapping (but you cannot, unless of course you do it with unsigned types).

Of course that does not at all mean that it is safe to use constructs that "wrap only at runtime", because those code paths are poisoned at compile time as well. For example in your example,

extern void bar(void);

void foo(int x) {
    if (x + 150 < 5)
         bar();
}

Is compiled by GCC 6.3 targeting x64 to

foo:
        cmp     edi, -145
        jl      .L4
        ret
.L4:
        jmp     bar

Which is the equivalent of

void foo(int x) {
    if (x < -145)
         bar(); // with tail call optimization
}

.. which is the same if you assume that signed integer overflow is impossible (in the sense that it puts an implicit precondition on the inputs to be such that overflow will not happen).

harold
  • 61,398
  • 6
  • 86
  • 164
  • Everything is clear when you understand that it is the fact compiler make optimisations that lead to UB. Some may have tried to explain it to me, but you did it with success. Thank's. – Jules Lamur Jan 10 '17 at 00:52
  • 2
    @Kadriles I wouldn't quite put it that way, because the UB is there regardless of what the compiler does. But the behaviour would have been innocent in practice were it not for compiler involvement, which actually makes this bugs even worse: in unoptimized debug builds, code like this will usually not exhibit its problems. – harold Jan 10 '17 at 00:59
  • That comment about the optimizations the compiler can make based on target platform characteristics brought to mind something I saw generated by an old version of MSVC: implementing unsigned division by constant 3 by something like `load eax with arg; xor edx,edx; mul 0xaaaaaaaa; shr edx,1; return edx`. – Daniel Schepler Jan 10 '17 at 01:01
2

Your analysis of the first example is incorrect. value += x; is equivalent to:

value = value + x;

In this case value is int and x is unsigned, so the usual arithmetic conversion means that value is first converted to unsigned, so we have an unsigned addition which by definition cannot overflow (it has well-defined semantics in accordance with modular arithmetic).

When the unsigned result is assigned back to value, if it is larger than INT_MAX then this is an out-of-range assignment which has implementation-defined behaviour. This is NOT overflow because it is assignment, not an arithmetic operation.

Which optimizations are possible therefore depends on how the implementation defines the behaviour of out-of-range assignment for integers. Modern systems all take the value which has the same 2's complement representation, but historically other systems have done some different things.

So the original example does not have undefined behaviour in any circumstance and the suggested optimization is , for most systems, not possible.


Your second example has nothing to do with your first example since it does not involve any unsigned arithmetic. If x > INT_MAX - 150 then the expression x + 150 causes undefined behaviour due to signed integer overflow. The language definition does not mention ALUs or CPUs so we can be certain that those things are not related to whether or not the behaviour is undefined.

If yes, how does the compiler detect the overflow?

It doesn't have to. Precisely because the behaviour is undefined, it means the compiler is not constrained by having to worry about what happens when there is overflow. It only has to emit an executable that exemplifies the behaviour for the cases which are defined.

In this program those are the inputs in the range [INT_MIN, INT_MAX-150] and so the compiler can transform the comparison to x < -145 because that has the same behaviour for all inputs in the well-defined range, and it doesn't matter about the undefined cases.

anatolyg
  • 26,506
  • 9
  • 60
  • 134
M.M
  • 138,810
  • 21
  • 208
  • 365