Why does the clang sanitizer think this left shift of an unsigned number is undefined?

Question

I know there are many similar questions on SO. Please read carefully before calling this a dup. If it is, I would be happy to get a reference to the relevant question.

It seems to me that the clang sanitizer is complaining about a perfectly valid left shift of an unsigned number.

int main()
{
    unsigned int x = 0x12345678;
    x = x << 12;
    return 15 & x;
}

Compiled thusly:

clang -fsanitize=undefined,integer shift-undefined.cpp -lubsan -lstdc++

Results in this error:

shift-undefined.cpp:4:11: runtime error: left shift of 305419896 by 12 places cannot be represented in type 'unsigned int'

I understand that some bits will be shifted off into oblivion, but I thought that was legal for unsigned numbers. What gives?

Since the expression is only using compile time constants, the compiler is probably trying to simplify it to a single constant and realizes it can't. — Mark Ransom, Sep 19 '22 at 22:06
@MikelF in case of left bitwise shift on unsigned it is defined. — Slava, Sep 19 '22 at 22:08
Cannot reproduce neither clang-12 nor clang13, what version of clang are you using? — Slava, Sep 19 '22 at 22:10
@Slava Thanks for the clarification. Learning new things on a Monday. Who would have thought? — Mikel F, Sep 19 '22 at 22:12
`-fsanitize=integer` is useful if you are not expecting your code to do the kind of this your code is doing. — Eljay, Sep 19 '22 at 22:19
@Slava I think it's intentional. It makes it possible to search for unsigned overflows where none is expected. — Ted Lyngmo, Sep 19 '22 at 22:25
@Slava I agree. It could have had a name different from UndefinedBehaviorSanitizer. — Ted Lyngmo, Sep 19 '22 at 22:34

Ted Lyngmo · Accepted Answer · 2022-09-19T22:25:03.237

10

-fsanitize=address,integer

The integer sanitizer turns on checking for "suspicious" overflows of unsigned integers too, which do not have undefined behavior.

See "-fsanitize=unsigned-integer-overflow: Unsigned integer overflow, where the result of an unsigned integer computation cannot be represented in its type. Unlike signed integer overflow, this is not undefined behavior, but it is often unintentional. This sanitizer does not check for lossy implicit conversions performed before such a computation (see -fsanitize=implicit-conversion)."

I'd remove that option and only concentrate on signed integer overflow:

-fsanitize=address,signed-integer-overflow

edited Sep 19 '22 at 22:25

answered Sep 19 '22 at 22:12

Ted Lyngmo

93,841
5
60
108

It might be nice if the sanitizer wording was a little clearer. The "SUMMARY" contains "undefined-behavior". – Ben Ylvisaker Sep 19 '22 at 22:28
@BenYlvisaker Yes, that's very unfortunate. – Ted Lyngmo Sep 19 '22 at 22:29
Also I found the following bit of clang documentation, which seems relevant, but didn't seem to work the way I expected to with 30 seconds of testing. https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#silencing-unsigned-integer-overflow – Ben Ylvisaker Sep 20 '22 at 19:10
@BenYlvisaker Yes, `UBSAN_OPTIONS=silence_unsigned_overflow=1` works, but not if the type you are overflowing is smaller than an `int` so it gets promoted. I tried with an `unsigned short` and then it still triggered, but when switching to an `unsigned int` it stopped. – Ted Lyngmo Sep 20 '22 at 19:27
@TedLyngmo: Note that in gcc, integer overflow that occurs with promoted unsigned short operands may cause arbitrary nonsensical behavior, even in cases where code does nothing with the computed result but store it in an `unsigned` object whose value would be ignored. – supercat Sep 21 '22 at 20:05
@supercat That sounds scary but as long as it's not affecting the observable behavior of the program I guess it's fine. Is it possible to view this behavior in some clever way? – Ted Lyngmo Sep 21 '22 at 20:13
@TedLyngmo: See https://godbolt.org/z/fe3xd4chr for an example. Looking at the source code for `test`, it would seem there shouldn't be any way it could ever write to `arr[32770]`, but looking at the three lines of assembly language code reveals that all it does is unconditionally stores 0 to `arr[x]`. If a larger value of `x` were passed, the generated machine code would write storage past the end of the array, even though the source code would skip the store in those cases just as it would with 32770. – supercat Sep 21 '22 at 20:29
@supercat Neat. I guess with the UB in `return (x*y) & 0xFFFF;` (where `x * y` becomes a signed integer overflow) anything could happen. Removing the UB with `return ((unsigned)x*(unsigned)y) & 0xFFFF;` and it seems to behave ok. – Ted Lyngmo Sep 21 '22 at 20:40
@TedLyngmo: According to the published Rationale, the Committee expected that commonplace implementations would process such a construct as though the multiplication used unsigned arithmetic. The authors of the Standard didn't want to impose any judgment about how an implementation should process such code if e,g, it has a fast signed integer multiply that does something weird in case of overflow, but no fast unsigned integer multiply, but also didn't intend that compilers whose platforms could process unsigned math quickly would gratuitously deviate from such behavior. – supercat Sep 21 '22 at 20:44
@supercat Oh, cool. I always thought that smaller integer types, even if `unsigned`, _had_ to be promited to plain `int`. Are there any implementations that preserves the unsignededness (is that even a word?) in this situation? Edit: MSVC seems to keep the `1` in the array at least so perhaps I found an implementation that does. – Ted Lyngmo Sep 21 '22 at 20:54
@TedLyngmo: In code such as the example I provided, there are no situations in which signed arithmetic would have a defined behavior inconsistent with unsigned arithmetic. The published Rationale notes the situations in which implementations were required to process signed and unsigned math differently, and observed that commonplace implementations would process signed and unsigned math identically in all other cases. See page 44 of https://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf starting on line 20 for more information. – supercat Sep 21 '22 at 21:04
@TedLyngmo: BTW, an even scarier optimization that gcc will sometimes perform in C++ mode, and clang will sometimes perform in both C and C++ mode, is identifying that conditions that must apply for a side-effect-free loop to exit, and then simultaneously omitting any tests after the loop which would pass under conditions where the loop could exit, and omitting the code for the loop itself. Consequently, even a program's behavior would consist of a sequence of 100% defined steps, the fact that a loop would execute forever may allow a compiler to substitute arbitrary behavior instead. – supercat Sep 21 '22 at 21:49
@supercat Yeah, that last one I knew about. I kind of like it and dislike it at the same time :-) – Ted Lyngmo Sep 22 '22 at 05:46
@TedLyngmo: If a program ever be exposed to potentially malicious input, allowing a compiler to generate code that would allow security exploits may be useful if such code was more efficient than code which would block them. If, however, a program will be exposed to potentially malicious inputs, such "optimizations" will be at best counter-productive. – supercat Sep 22 '22 at 05:57
@TedLyngmo: I wish the Standard would explicitly acknowledge that it is deliberately allows compilers intended for some purposes to make optimizations that would render them unsuitable for others. If a construct's behavior would be defined but for the existence of part of the Standard that says it's "undefined", implementations should feel free to deviate from the "otherwise-defined"behavior in those cases--and only in those cases--where doing so would not interfere with what their users need to do. Unfortunately, for the Standard to say that would be to denounce the behavior of clang/gcc. – supercat Sep 22 '22 at 16:13

Why does the clang sanitizer think this left shift of an unsigned number is undefined?

1 Answers1