12

As a follow-up to "https://stackoverflow.com/questions/33732041/why-static-castunsigned-intushrt-maxushrt-max-yields-correct-value"

I was asking myself if promoting all types (except some exceptions) with a lower rank than int to int to perform arithmetic operations might cause UB in some cases.

e.g.:

unsigned short a = 0xFFFF;
unsigned short b = a*a;

As unsigned short is promoted to int for arithmetic operations this would result in:

unsigned short a = 0xFFFF;
unsigned short b = (int)a*(int)a;

As (int)0xFFFF*(int)0xFFFF causes an overflow, and overflow of signed types is UB: Can multiplying two unsigned shorts x,y cause undefined behaviour in the case that x*y > INT_MAX


UPDATE:

The question specifically aims at the case that int is 32-bit and short is 16-bit.

Community
  • 1
  • 1
Simon Kraemer
  • 5,700
  • 1
  • 19
  • 49
  • Yes. The upshot is to not do arithmetic with unsigned types of lower conversion rank than `int`. A simpler rule is to not use unsigned types for numbers, but do use them for bit-fiddling. – Cheers and hth. - Alf Nov 16 '15 at 10:00
  • Yes, this is signed integer overflow causing UB. An annoying historical wart, and it can strike in disguise because `uint16_t` is often implemented as a typedef for `unsigned short`. Theoretically the same problem could even occur with `uint32_t`, as there is nothing stopping a compiler making `short` be 32-bit on a system with 64-bit `int` for example. – M.M Nov 16 '15 at 10:00
  • @M.M: Actually there is nothing special about `short` here. Just any unsigned type smaller than `int` will be promoted to `int` before arithmetic operations take place. In particular this goes for `uint32_t` if `int` is 64-bit. – Marc van Leeuwen Nov 16 '15 at 11:10
  • I think you can (theoretically) incur the same UB by multiplying to `size_t`s, since I didn't find a constraint in the standard that it must be at least as big as int. https://twitter.com/fugueish/status/637715389519015941 – CodesInChaos Nov 16 '15 at 16:44

2 Answers2

11

C++11 §3.9.1/4, full quote:

Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer.

Apart from the slightly misleading wording about “declared unsigned” this might seem to apply that every arithmetic expression that involve only argument of some given unsigned type, will yield a result modulo 2n for that type.

However, there are no arithmetic expressions at all for unsigned types of lower conversion rank than int: all arguments in an apparent such expression are converted up to (1)at least int, or depending on the number ranges of the C++ implementation, up to unsigned int.

As a result, a*b where a and b are unsigned short values, (2)can have formally Undefined Behavior. Because it's not an unsigned short expression. It's (in practice) an int expression.

That said, with a reasonable compiler that doesn't introduce special casing where it notices formal UB, and with in-practice 8 bit bytes and unsigned short max value that is representable by int, and common two's complement signed integer representation, the result, when converted back down to unsigned short, will be as if it was modular arithmetic in the range of unsigned short. That's because two's complement, at the machine code level, is just modular arithmetic with a range centered on 0.


(1) In practice one will usually be using an 8 bits-per-byte implementation where the maximum value of unsigned short fits well within the int range, so in practice we're talking about a conversion up to int.
(2) E.g., for 16-bit unsigned short and 32-bit int, (216−1)2 = 232−2×216+1 > 231−1, where the last value is the maximum positive int value.

Cheers and hth. - Alf
  • 142,714
  • 15
  • 209
  • 331
  • Any idea why unsigned types aren't just promoted to `unsigned int`? This would solve the problem. – Simon Kraemer Nov 16 '15 at 10:21
  • @SimonKraemer: I don't know. But as I recall James Kanze once remarked that the conversion rules were changed early on. That was in connection with the problems of mixing signed and unsigned in expressions. – Cheers and hth. - Alf Nov 16 '15 at 10:24
  • Well, it shouldn't matter in real life. Thanks for digging up the rules. – Simon Kraemer Nov 16 '15 at 10:27
  • Unfortunately C compilers are generally not reasonable according to your definition. – CodesInChaos Nov 16 '15 at 16:45
  • 1
    @CodesInChaos: It would be nice with at least one example compatible with your assertion. Here's an example where the gcc compiler, *for the given code*, is reasonable: http://coliru.stacked-crooked.com/a/036e12a08cedfd95 – Cheers and hth. - Alf Nov 16 '15 at 17:29
  • @Alf If there is one thing I learned about C/C++ compilers, it's to never mess with UB. http://coliru.stacked-crooked.com/a/0217d57fe15e355e – CodesInChaos Nov 16 '15 at 17:49
  • @CodesInChaos: So, gcc uses the UB to optimize in some output that's contradicted by the identical final results. Jeez! Thanks. And that's common for C compilers? Unable to reproduce with MSVC, though. – Cheers and hth. - Alf Nov 16 '15 at 18:00
  • 2
    @Alf the canonical example is optimizing `x + 1 > x` to `true` for signed integers. Controlled by [`-fstrict-overflow`](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html) which is part of `-O2` for GCC. See also [What Every C Programmer Should Know About Undefined Behavior #2/3](http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html) on the LLVM blog. – CodesInChaos Nov 16 '15 at 18:09
  • @SimonKraemer unsigned ints sometimes have surprising semantics in comparisons, as Alf alluded to. consider (on a 32-bit processor), `ushortvar < intvar` where intvar is -1. This will always be false, as expected, since `ushortvar` is promoted to int, in the range 0..65535. If it instead gets promoted to `unsigned` in range 0..65535, the comparison will be true, since it becomes `(unsigned)ushortvar < (unsigned)-1` and `(unsigned)-1` is 0xFFFFFFFFu. Another way to look at this: unsigned short is a reduced range int (unless it is the same width as int :-) ) – greggo Nov 16 '15 at 19:07
  • Downvoted for the dangerously incorrect "reasonable compiler" final paragraph, as per @CodesInChaos comments. Ignoring "formal UB" might be OK if you never compile with optimizations turned on, but even then it wouldn't be my recommendation. – ndkrempel Mar 29 '19 at 15:54
3

When you multiply unsigned short * unsigned short then there is an implicit conversion and the value is casted to int in C++11. The documentation says:

Prvalues of small integral types (such as char) may be converted to prvalues of larger integral types (such as int). In particular, arithmetic operators do not accept types smaller than int as arguments

So it will result in an Undefined behavior.

Rahul Tripathi
  • 168,305
  • 31
  • 280
  • 331