On Undefined Behavior

Question

Generally, UB is regarded as being something that has to be avoided, and the current C standard itself lists quite a few examples in appendix J.

However, there are cases where I can see no harm in exploiting UB other than sacrificing portability.

Consider the following definition:

int a = INT_MAX + 1;

Evaluating this expression leads to UB. However, if my program is intended to run on a, say, 32-bit CPU with modular arithmetic representing values in Two's Complement, I'm inclined to believe that I can predict the outcome.

In my opinion, UB is sometimes just the C standard's way of telling me: "I hope you know what you're doing, because we can't make any guarantees on what will happen."

Hence my question: is it safe to sometimes rely on machine-dependent behavior, even if the C standard considers it to invoke UB, or is "UB" really to be avoided, no matter what the circumstances are?

*"I'm inclined to believe that I can predict the outcome.*" - No, you can infer the outcome using a specific compiler using specific options using a specific version of the runtime on a specific platform. This would be an *awful* reason to rely on UB. — Ed S., May 23 '11 at 22:55
You're thinking of "Implementation Defined" behavior... it has a defined behavior you can rely on, but the behavior is different for every platform and the documentation should tell you want it is. UB can be unpredictable... even when you think you know, you might have missed a corner case where it doesn't do the thing you thought it would. — SoapBox, May 23 '11 at 23:08
If you want to make this code snippet implementation-defined instead of UB, simply change the `1` to `1U`. The conversion back to a signed value is implementation-defined, and will "always" (on modern twos complement machines) be what you want. But why would you write such nasty code anyway when there are clean, portable ways to do it? — R.. GitHub STOP HELPING ICE, May 24 '11 at 01:26
@Hans: Nice thinking, completely forgot about that one. :) +1 — user541686, May 24 '11 at 01:31
@R, @SoapBox, @Ed, etc.: Is "implementation defined behavior" really different from "undefined behavior"? AFAIK, "undefined behavior" means "behavior undefined by **this** standard", which doesn't mean the behavior *has* to be unpredictable; it just means it's beyond the scope of the standard. So the implementation may or may not define it. Is that really different from "implementation-defined"? — user541686, May 24 '11 at 01:33
@Mehrdad: The term "implementation defined" means that a standards-complaint compiler is free to choose among various ways of doing things, but must document what it does. It's also possible for some aspect of a behavior to be unspecified; a compiler evaluating the the expression `a(b(),c())` may arbitrarily select between two behaviors: do `b()` in its entirety and then `c()`, or do `c()` int is entirety and then `b()`. The compiler could if it was so inclined make a new selection each time the code executes. Note that there is no *undefined behavior*, but the sequence is *unspecified*. — supercat, Jan 12 '13 at 03:28
@supercat: So the compiler "must document what it does", but can't "what it does" be something completely random/undefined/etc.? (Say, the documentation might say, *"The compiler generates a random number according to some arbitrary distribution and acts according to that number."*) So if you don't know what compiler you're writing for, it wouldn't really matter whether the standard called it "implementation defined" versus if it called it "unspecified" or whatever, right? Either way you can't predict anything... the terminology doesn't get you anywhere. — user541686, Jan 12 '13 at 03:32
@Mehrdad: In most cases, the standard will provide certain constraints upon what a compiler is allowed to do. For example, the maximum value that can be represented in an unsigned int is implementation-defined. If a compiler vendor wanted its `int` type which could only accept values up to 8,675,309 and crash if fed anything larger, it could do so provided it documented that behavior, and provided that its `char` type type couldn't handle anything larger. — supercat, Jan 12 '13 at 03:38
@supercat: Right, but my point is that no matter what the scenario is, *those are your only guarantees*; in other words, when you're trying to write portable code, it does not matter a *single bit* whether the guarantees which are not given are in fact "implementation defined" or "unspecified" or "undefined" -- the bottom line is that whatever it might be, you only have the standard to rely on, so it really doesn't make any difference what the particular nomenclature/definition is; either way the outcome is the same: you can't rely on it. — user541686, Jan 12 '13 at 03:54
@Mehrdad: If one is writing a portable C program, one must refrain from doing anything which assumes an `int` can hold a value larger than 32767 unless one tests the value of `INT_MAX`. If, however, one does test the value of `INT_MAX`, or one simply ensures that an `int` will never be called upon to hold a value greater than 32767, there's no problem. Scenarios involving Undefined Behavior, however, have no guarantees of any sort. — supercat, Jan 12 '13 at 04:02
@supercat: Sure, I'm not disagreeing with you. *If* there are certain guarantees then of course there are certain guarantees. :-) I'm just saying the fact that it's *called* "undefined behavior" doesn't mean anything -- it's only the *(lack of) guarantees* that matters, and that's it. Worrying (as a lot of people do) about what it's *called* (treating "undefined" and "unspecified" as two separate beasts when you haven't looked at the standard to realize what your constraints really are) is just paying attention to the wrong thing; either way, you only have the guarantees in the standard. — user541686, Jan 12 '13 at 04:17
@Mehrdad: The term "undefined behavior" means that the compiler is free to do *anything*. By contrast, unspecified means that a compiler is free to arbitrarily select from among a number of alternatives. Code which uses should be written to work correctly with any of the possible alternatives; by contrast, code cannot be written to work correctly with Undefined Behavior (though it's possible a particular implementation may specify behavior in cases which the standard leaves unspecified). — supercat, Jan 12 '13 at 16:15

score 15 · Accepted Answer · edited Jun 20 '20 at 09:12

No, unless you're also keeping your compiler the same and your compiler documentation defines the otherwise undefined behavior.

Undefined behavior means that your compiler can ignore your code for any reason, making things true that you don't think should be.
Sometimes this is for optimization, and sometimes it's because of architecture restrictions like this.

I suggest you read this, which addresses your exact example. An excerpt:

Signed integer overflow:

If arithmetic on an int type (for example) overflows, the result is undefined. One example is that INT_MAX + 1 is not guaranteed to be INT_MIN. This behavior enables certain classes of optimizations that are important for some code.

For example, knowing that INT_MAX + 1 is undefined allows optimizing X + 1 > X to true. Knowing the multiplication "cannot" overflow (because doing so would be undefined) allows optimizing X * 2 / 2 to X. While these may seem trivial, these sorts of things are commonly exposed by inlining and macro expansion. A more important optimization that this allows is for <= loops like this:
for (i = 0; i <= N; ++i) { ... }
In this loop, the compiler can assume that the loop will iterate exactly N + 1 times if i is undefined on overflow, which allows a broad range of loop optimizations to kick in. On the other hand, if the variable is defined to wrap around on overflow, then the compiler must assume that the loop is possibly infinite (which happens if N is INT_MAX) - which then disables these important loop optimizations. This particularly affects 64-bit platforms since so much code uses int as induction variables.

AnT stands with Russia · Answer 2 · 2011-05-23T23:02:24.543

No.

The compiler take advantage of undefined behavior when optimizing the code. A well-known example is the strict overflow semantics in GCC compiler (search for strict-overflow here) For example, this cycle

for (int i = 1; i != 0; ++i)
  ...

supposedly relies on your "machine dependent" overflow behavior of signed integer type. However, the GCC compiler under the rules of strict overflow semantics can (and will) assume that incrementing an int variable can only make it larger, and never smaller. This assumption will make GCC optimize-out the arithmetics and generate an endless cycle instead

for (;;)
  ...

since this is a perfectly valid manifestation of undefined behavior.

Basically, there's no such thing as "machine-dependent behavior" in C language. All behavior is determined by the implementation and the level of implementation is the lowest level you can ever get to. Implementation isolates you from the raw machine and isolates you perfectly. There's no way to break through that isolation and get to the actual raw machine, unless the implementation explicitly permits you to do so. Signed integer overflow is normally not one of those contexts where you are allowed to access the raw machine.

That assuming the body of `for` loop doesn't modify `i`. – Konrad Borowski Nov 24 '13 at 10:50 — Konrad Borowski, Nov 24 '13 at 10:50

score 2 · Answer 3 · answered May 23 '11 at 22:49

If you know for a fact that your code will only be targeting a specific architecture, compiler, and OS, and you know how the undefined behavior works (and that that won't change), then it's not inherently wrong to use it occasionally. In your example, I think I can tell what's going to happen as well.

However, UB is rarely a preferred solution. If there's a cleaner way, use it. Using undefined behavior should really never be absolutely necessary, but it might be convenient in a few cases. Never rely on it. And as always, comment your code if you ever rely on UB.

And please, don't ever publish code that relies on undefined behavior, because it'll just end up blowing up in someone's face when they compile it on a system with a different implementation than the one that you relied on.

"you know how the undefined behavior works (and that that won't change)" -- that would be knowing the unknowable. And even if it were knowable, you're likely to be wrong. — Jim Balter, May 24 '11 at 00:40
Program like the guy maintaining your code is a psychopathic mass murderer who knows where you live... Doing this would make me that guy...... — mattnz, May 24 '11 at 04:43

score 2 · Answer 4 · answered May 23 '11 at 22:56

In general, it's better to completely avoid it. On the other hand, if your compiler documentation explicitly states that that specific thing that is UB for the standard is instead defined for that compiler, you may exploit it, possibly adding some #ifdef/#error machinery to block the compilation in case another compiler is used.

score 2 · Answer 5 · answered Mar 30 '12 at 16:05

If a C (or other language) standard declares that some particular code will have Undefined Behavior in some situation, that means that a C compiler can generate code to do whatever it wants in that situation, while remaining compliant with that standard. Many particular language implementations have documented behaviors which go beyond what is required by the generic language standard. For example, Whizbang Compilers Inc. might explicitly specify that its particular implementation of memcpy will always copy individual bytes in address order. On such a compiler, code like:

  unsigned char z[256];
  z[0] = 0x53;
  z[1] = 0x4F;
  memcpy(z+2, z, 254);

would have behavior which was defined by the Whizbang documentation, even though the behavior of such code is not specified by any non-vendor-specific C language specification. Such code would be compatible with compilers that comply with Whizbang's spec, but could be incompatible with other compilers which comply with various C standards but do not comply with Whizbang's specifications.

There are many situations, especially with embedded systems, where programs will need to do some things which the C standards do not require compilers to allow. It is not possible to write such programs to be compatible with all standards-compliant compilers, since some standards-compliant compilers may not provide any way to do what needs to be done, and even those that do might require different syntax. Nonetheless, there is often considerable value in writing code that will be run correctly by any standards-compliant compiler.

score 0 · Answer 6 · answered May 23 '11 at 22:56

If the standard says that doing something is undefined, then it is undefined. You may like to think you can predict what the outcome will be, but you can't. For a specific compiler you may always get the same result, but for the next iteration of the compiler, you may not.

And undefined behaviour is so EASY to avoid - don't write code like that! So why do people like you want to mess with it?

score 0 · Answer 7 · answered May 24 '11 at 04:48

0

No! Just because it compiles, runs and gives the output you hoped for does not make it correct.

answered May 24 '11 at 04:48

mattnz

517
2
13

On Undefined Behavior

7 Answers7

Linked