3

I realize that there is a rule by which numbers with a width smaller than int can be promoted to a wider type for the addition operation. But I cannot fully explain how only one permutation of the following print_unsafe_minus will fail. How is it that only the <unsigned, long> example fails, and what is the take-away for programmers with regards to best practices?

#include <fmt/core.h>

template<typename M, typename N>
void print_unsafe_minus() {
        M a = 3, b = 4;
        N c =  a - b;
        fmt::print("{}\n", c);
}
int main() {
    // storing result of unsigned 3 minus 4 to a signed type

    print_unsafe_minus<uint8_t, int8_t>(); // -1
    print_unsafe_minus<uint16_t, int8_t>(); // -1
    print_unsafe_minus<uint32_t, int8_t>(); // -1
    print_unsafe_minus<uint64_t, int8_t>(); // -1

    print_unsafe_minus<uint8_t, int16_t>(); // -1
    print_unsafe_minus<uint16_t, int16_t>(); // -1
    print_unsafe_minus<uint32_t, int16_t>(); // -1
    print_unsafe_minus<uint64_t, int16_t>(); // -1

    print_unsafe_minus<uint8_t, int32_t>(); // -1
    print_unsafe_minus<uint16_t, int32_t>(); // -1
    print_unsafe_minus<uint32_t, int32_t>(); // -1
    print_unsafe_minus<uint64_t, int32_t>(); // -1

    print_unsafe_minus<uint8_t, int64_t>(); // -1
    print_unsafe_minus<uint16_t, int64_t>(); // -1
    print_unsafe_minus<uint32_t, int64_t>(); // 4294967295
    print_unsafe_minus<uint64_t, int64_t>(); // -1
}

(edit) Also worth noting-- if we extend the example to include 128-bit integers, then the following two permutations fail as well:

print_unsafe_minus<uint32_t, __int128>(); // 4294967295
print_unsafe_minus<uint64_t, __int128>(); // 18446744073709551615
Patrick Parker
  • 4,863
  • 4
  • 19
  • 51
  • Do you know what the promotion rules say exactly? All these examples should be explainable by following them precisely. When you try to interpret the rules in these cases, can you explain your interpretation and where exactly it differs from what happens? – Nate Eldredge Mar 11 '21 at 23:04
  • @NateEldredge if `uint8_t` and `uint16_t` get promoted for addition to the same width as `uint32_t`, then should they not also fail as it does for `int64_t`? – Patrick Parker Mar 11 '21 at 23:21
  • 3
    *what is the take-away for programmers with regards to best practices?* don't mix signed and unsigned. Really, `unsigned` types are only needed for bitwise operations. – NathanOliver Mar 11 '21 at 23:21
  • `only the ` from your code only `uint32_t, int64_t` fails, not `unsigned, long` – KamilCuk Mar 11 '21 at 23:40
  • There is **no failure**. Unsigned **32-bit** integer underflow, to what would be -1, gives 4294967295, and this is properly representable as a signed, 64-bit integer. So that's what you get. – Adrian Mole Mar 11 '21 at 23:41
  • 1
    @NathanOliver Well, there are the STL container types, with their annoying `size_t` things! – Adrian Mole Mar 11 '21 at 23:48
  • @AdrianMole while it is good to know that the compiler is operating as designed, by "fail" here I am referring to the pitfall of inadvertently getting the counterintuitive result which many programmers would not expect. – Patrick Parker Mar 11 '21 at 23:48
  • @PatrickParker: Ah, that's not what happens. The key fact is that (on your machine) `uint8_t` and `uint16_t` have rank less than `int` and therefore get promoted to `int` which is signed. – Nate Eldredge Mar 11 '21 at 23:51
  • As @Nate says, the most 'counterintuitive' part of your code is that unsigned types smaller than `int` get promoted to ***signed*** `int`. – Adrian Mole Mar 11 '21 at 23:52

2 Answers2

4

Before we start, let us assume OP is using an implementation with 32-bit int type. That is, int32_t is equivalent to int.

Let X be the width of M, and Y be the width of N.

Let us divide your test cases into three categories:

First Category: X <= 16

Integer promotions applies here, which is always done before invoking an arithmetic operator.

uint8_t and uint16_t have their whole value ranges representable by int, hence they are promoted to int before doing the subtraction. Then you get a signed value of -1 from doing 3 - 4, which is then used to initialize a signed integer type, which regardless of its width can hold -1. Thus you get -1 as output.

Second Category: (X >= 32) and (X >= Y)

No promotion happens before doing the subtraction.

The rule that applies here is that unsigned integer arithmetic is always modulo 2X, where X is the width of the integer.

Hence a - b always give you 2X - 1, since this is the value that is equal to -1 modulo 2 in the range of M.

Now you assign it to a signed type. Let us assume C++20 (before C++20 it is implementation-defined behavior when assigning an unsigned value that cannot be represented by a destination signed type).

Here the result of a - b (i.e 2X - 1) is converted to the unique value that is congruent to itself modulo 2Y in the destination range (i.e from -2Y-1 to 2Y-1 - 1). Since X >= Y, this is always going to be -1.

So you get -1 as output.

Third Category: (X >= 32) and (X < Y)

There is only one case in this category, namely the case where M = uint32_t, N = uint64_t.

The subtraction is the same as in category 2, where you get 232 - 1.

The rule to convert to the signed type is still the same. However, this time, 232 - 1 is equal to itself modulo 264, so the value remains unchanged.

Note: 4294967295 == 232 - 1

Take Away

This is probably a surprising aspect of C++, and as suggested by @NathanOliver, you should avoid mixing signed types and unsigned type, and take extreme care when you do want to mix them.

You can tell the compiler to generate warnings for such conversion by turning on -Wconversion. Your code gets a lot of warnings when this is turned on.

ph3rin
  • 4,426
  • 1
  • 18
  • 42
  • So, let's say I want to leave this `c = a - b;` in my code knowing that it is currently safe but that at some future point another developer might change the sizes of signed c or unsigned a or b. What static_assert would encapsulate the principles here by which it remains "safe" i.e. it doesn't underflow from negative to positive? – Patrick Parker May 29 '21 at 08:09
  • @PatrickParker Well, I think this deserves a separate question and more research. So correct me if I am wrong, you basically want the following: given two integer types, you want to get, at compile-time, a signed type that can hold any difference value between the operand types. – ph3rin May 31 '21 at 19:06
  • Well, it's more like this: developer A controls a header which defines the typedef of the signed and unsigned operands which might change. developer B controls the code `signed_c = unsigned_a - unsigned_b` which will not change. and a third developer (me) would like to write a static_assert, using those variables and/or typedefs, to make sure the result won't be underflowing to become a positive number. You're right in that it should probably be a separate question. But it was what I was trying to get at with my "what is the takeaway" question. – Patrick Parker Jun 01 '21 at 02:16
  • can you just confirm if this static_assert correctly prevents that underflow to positive scenario? `static_assert(sizeof(a-b)>=sizeof(c) || sizeof(a-b) – Patrick Parker Jun 01 '21 at 02:36
  • @PatrickParker You are taking the difference between two unsigned values and converting it to a signed value, the result may not be what you want (I made a slip earlier on this in an earlier comment). For example, `UINT_MAX- 0u` and `0u - 1u` would both give you `UINT_MAX`. You'd have to cast `a` and `b` to a signed type wider than both before doing the subtraction (i.e. `uint32_t` to `int64_t`) – ph3rin Jun 01 '21 at 03:03
  • I already know in advance that the result won't be outside the range of what can be held by a typical `int` so nowhere near UINT_MAX and that is why I was only concerned specifically about the underflow to positive scenario. There are many contexts in which numbers approaching the size UINT_MAX just do not appear. But thanks. – Patrick Parker Jun 01 '21 at 04:03
2

Let's assume a sane two-complement platform where int has 32-bits and uint32_t is the same as unsigned.

    uint32_t a = 3, b = 4;
    int64_t c =  a - b;

Operands to - operator undergo integral promotions*. int cannot represent all values of uint32_t, but 32-bit unsigned can represent all values of uint32_t. The values are promoted to unsigned. The result type of - is the common type of operands after promotions - both operands are unsigned. The result type of - operator is unsigned. a - b is mathematically -1. The result is (unsigned)-1, but unsigned cannot represent negative numbers. So -1 is converted to an unsigned type, it "wraps around" and results in UINT_MAX, which is equal to UINT32_MAX, because unsigned has 32-bits. This result is representable in int64_t so no conversion happens and c is assigned the value of UINT32_MAX.

In contrast let's take for example <uint16_t, int64_t>. A 32-bit int can represent all values of an uint16_t, so uint16_t is promoted to int, so the result of a - b is just an (int)-1. There is no conversion from (int)-1 to an unsigned number. Then int64_t can represent -1, so the value -1 is just assigned to a variable with type int64_t.

* It's called integer promotions in C language...

KamilCuk
  • 120,984
  • 8
  • 59
  • 111
  • and what is the take-away for programmers with regards to best practices? e.g. "don't worry about assigning unsigned a-b to signed c unless c is wider than int and different width than (a-b)"? – Patrick Parker Mar 12 '21 at 00:09
  • Well, I do not understand that part. Best practices to what? What should be the result? If you want to have `-1`, just write `-1` instead of subtracting two numbers. Definitely a best practice is to learn the language and know how it works. If you do not like implicit promotions and conversions and weak typing, move to another programming language. – KamilCuk Mar 12 '21 at 00:14
  • `don't worry about assigning unsigned a-b to signed c unless c is wider than int and different width than (a-b)` What __result__ do you want to have? The code is working as intended. I believe you want to be explicit - `N c = (M)(a - b);` - in which case it would overflow according to the unsigned type (I do not know if that is the __intended result__). It all depends on what you __want__. In any the best practice could be to follow some rules in MISRA and other safety standard that specifically handle the confusing behavior of implicit promotions in code. – KamilCuk Mar 12 '21 at 00:18
  • I see ex rule 5-0-3 in MISRA C++ 2008 http://www.tlemp.com/download/rule/MISRA-CPP-2008-STANDARD.pdf . From that, other best practice would be to do `N c = (N)a - (N)b` - cast the values before calculation. But note that signed overflow is also a corner case - best would be to handle it separately. – KamilCuk Mar 12 '21 at 00:21