Adding unsigned integers in C

Question

Here are two very simple programs. I would expect to get the same output, but I don't. I can't figure out why. The first outputs 251. The second outputs -5. I can understand why the 251. However, I don't see why the second program gives me a -5.

PROGRAM 1:

#include <stdio.h>

int main()
{

unsigned char  a;
unsigned char  b;
unsigned int  c;

a = 0;
b= -5;

c =  (a + b);

printf("c hex: %x\n", c);
printf("c dec: %d\n",c);

}

Output:

c hex: fb
c dec: 251

PROGRAM 2:

#include <stdio.h>

int main()
{

unsigned char  a;
unsigned char  b;
unsigned int  c;

a = 0;
b=  5;

c =  (a - b);

printf("c hex: %x\n", c);
printf("c dec: %d\n",c);

}

Output:

c hex: fffffffb
c dec: -5

Both of your programs attempt to print an `unsigned int` value using `%d` format specifier. This is illegal and the behavior is undefined. In order to meaningfully `printf` an `unsigned int` value you need `%u` or any other specifier that expects an `unsigned int` argument. `%x` is good, since it expects `unsigned int`. But `%d` is completely unacceptable. This is in part why you get weird results. — AnT stands with Russia, Sep 07 '11 at 02:50
Really? It should still be explained why I am getting what I am getting. — user678392, Sep 07 '11 at 03:06
As AndreyT explained, the behavior is "undefined" -- that's explanation enough. If you study binary machine arithmetic you can figure out why it works the way it does, but, technically, the result you see isn't required -- the machine could legally print "potato" instead of "-5", since the behavior is "undefined". — Hot Licks, Sep 07 '11 at 03:36
@user678392: Really, really. As I said, your code produces undefined behavior. For all meaningful means and purposes its behavior is essentially random, or implementation-specific at best. While it is certainly possible to come up with a deterministic explanation of what you are "getting", there's no use it in whatsoever. It is a waste of time. — AnT stands with Russia, Sep 07 '11 at 04:21

R.. GitHub STOP HELPING ICE · Answer 1 · 2011-09-07T04:18:33.340

12

In the first program, b=-5; assigns 251 to b. (Conversions to an unsigned type always reduce the value modulo one plus the max value of the destination type.)

In the second program, b=5; simply assigns 5 to b, then c = (a - b); performs the subtraction 0-5 as type int due to the default promotions - put simply, "smaller than int" types are always promoted to int before being used as operands of arithmetic and bitwise operators.

Edit: One thing I missed: Since c has type unsigned int, the result -5 in the second program will be converted to unsigned int when the assignment to c is performed, resulting in UINT_MAX-4. This is what you see with the %x specifier to printf. When printing c with %d, you get undefined behavior, because %d expects a (signed) int argument and you passed an unsigned int argument with a value that's not representable in plain (signed) int.

edited Sep 07 '11 at 04:18

answered Sep 07 '11 at 02:44

R.. GitHub STOP HELPING ICE

208,859
35
376
711

1

At the assembly level, the issue is probably that the promotion to `int` is performed using an instruction that extends the sign bit, which is why `0xfb` becomes `0xfffffffb`. If the conversion is done without extending the sign bit, then you would get `0x000000fb`, which is 251 in decimal. – aroth Sep 07 '11 at 02:48
@R.. it seems like you are saying that c = (a - b) isn't the same thing as c = (a + -b). – user678392 Sep 07 '11 at 03:04
@aroth WHY is the conversion done in one case and not the other? – user678392 Sep 07 '11 at 03:05
@R... Why wouldn't the second program simply (a + -(b)) then convert to the data type of C? By what you are saying it seems arbitrary when/where the compiler is performing data type conversions. – user678392 Sep 07 '11 at 03:30
It *is* somewhat arbitrary when/where conversions are performed. In C the system is allowed to optionally "up convert" values to a wider representation in some cases, so long as the behavior of ***legal*** code is not altered. When you do things that are relying on undefined behavior you can get weird results. But in this case it's not that `c = (a - b)` isn't the same as `c = (a + -b)` it's that you didn't code the second expression at all, but rather you coded `c = (a + 251)`. – Hot Licks Sep 07 '11 at 03:45
`c = a - b;` and `c = a + -b;` are equivalent except in the case where `b` is `INT_MIN`, in which case the latter would have undefined behavior. But that can't happen if `b` has type `unsigned char`. What everybody seems to be missing is the fact that even if `b` has type `unsigned char`, its value is 5, and that will be promoted to `int` *prior to* applying either the unary or binary minus operator. – R.. GitHub STOP HELPING ICE Sep 07 '11 at 03:53
1

And none of this is "arbitrary" at all, if by arbitrary you mean "up to the compiler". It is all specified strictly/exactly in the C language specification. – R.. GitHub STOP HELPING ICE Sep 07 '11 at 03:54
1

This answer is correct. In the expression `c = (a - b);`, the value of `a` is promoted to type `int`; the value of `b` is promoted to type `int`; the subtraction is done in type `int` giving a result of `-5`; and that `-5` is then converted to `unsigned int`, giving a result of `UINT_MAX + 1 - 5`. – caf Sep 07 '11 at 04:17

score 2 · Answer 2 · answered Sep 07 '11 at 02:46

2

You're using the format specifier %d. That treats the argument as a signed decimal number (basically int).

You get 251 from the first program because (unsigned char)-5 is 251 then you print it like a signed decimal digit. It gets promoted to 4 bytes instead of 1, and those bits are 0, so the number looks like 0000...251 (where the 251 is binary, I just didn't convert it).

You get -5 from the second program because (unsigned int)-5 is some large value, but casted to an int, it's -5. It gets treated like an int because of the way you use printf.

Use the format specifier %ud to print unsigned decimal values.

answered Sep 07 '11 at 02:46

Seth Carnegie

73,875
22
181
249

why does the second program have (unsigned int) but the first one has unsigned char? – user678392 Sep 07 '11 at 02:58
@user because that's their types. – Seth Carnegie Sep 07 '11 at 03:28
Carengie huh? a and b are both unsigned chars. c is an unsigned int. i'm really confused where you are getting your data types. – user678392 Sep 07 '11 at 03:32

score 2 · Accepted Answer · answered Sep 07 '11 at 05:40

There are two separate issues here. The first is the fact that you are getting different hex values for what looks like the same operations. The underlying fact that you are missing is that chars are promoted to ints (as are shorts) to do arithmetic. Here is the difference:

a = 0  //0x00
b = -5 //0xfb
c = (int)a + (int)b

Here, a is extended to 0x00000000 and b is extended to 0x000000fb (not sign extended, because it is an unsigned char). Then, the addition is performed, and we get 0x000000fb.

a = 0  //0x00
b = 5  //0x05
c = (int)a - (int)b

Here, a is extended to 0x00000000 and b is extended to 0x00000005. Then, the subtraction is performed, and we get 0xfffffffb.

The solution? Stick with chars or ints; mixing them can cause things you won't expect.

The second problem is that an unsigned int is being printed as -5, clearly a signed value. However, in the string, you told printf to print its second argument, interpreted as a signed int (that's what "%d" means). The trick here is that printf doesn't know what the types of the variables you passed in. It merely interprets them in the way the string tells it to. Here's an example where we tell printf to print a pointer as an int:

int main()
{
    int a = 0;
    int *p = &a;
    printf("%d\n", p);
}

When I run this program, I get a different value each time, which is the memory location of a, converted to base 10. You may note that this kind of thing causes a warning. You should read all of the warnings your compiler gives you, and only ignore them if you're completely sure you are doing what you intend to.

James O'Doherty · Answer 4 · 2011-09-07T21:16:16.787

What you're seeing is the result of ~~how the underlying machine is representing the numbers~~ how the C standard defines signed to unsigned type conversions (for the arithmetic) and how the underlying machine is representing numbers (for the result of the undefined behavior at the end).

When I originally wrote my response I had assumed that the C standard didn't explicitly define how signed values should be converted to unsigned values, since the standard doesn't define how signed values should be represented or how to convert unsigned values to signed values when the range is outside that of the signed type.

However, it turns out that the standard does explicitly define that when converting from negative signed to positive unsigned values. In the case of an integer, a negative signed value x will be converted to UINT_MAX+1-x, just as if it were stored as a signed value in two's complement and then interpreted as an unsigned value.

So when you say:

unsigned char  a;
unsigned char  b;
unsigned int c;

a = 0; 
b = -5;
c = a + b;

b's value becomes 251, because -5 is converted to an unsigned type of value UCHAR_MAX-5+1 (255-5+1) using the C standard. It's then after that conversion that the addition takes place. That makes a+b the same as 0 + 251, which is then stored in c. However, when you say:

unsigned char  a;
unsigned char  b;
unsigned int c;

a = 0;
b = 5;
c = (a-b);

printf("c dec: %d\n", c);

In this case, a and b are promoted to unsigned ints, to match with c, so they remain 0 and 5 in value. However 0 - 5 in unsigned integer math leads to an underflow error, which is defined to result in UINT_MAX+1-5. If this had happened before the promotion, the value would be UCHAR_MAX+1-5 (i.e. 251 again).

However, the reason you see -5 printed in your output is a combination of the fact that the unsigned integer UINT_MAX-4 and -5 have the same exact binary representation, just like -5 and 251 do with a single-byte datatype, and the fact that when you used "%d" as the formatting string, that told printf to interpret the value of c as a signed integer instead of an unsigned integer.

Since a conversion from unsigned values to signed values for invalid values isn't defined, the result becomes implementation specific. In your case, since the underlying machine uses two's complement for signed values, the result is that the unsigned value UINT_MAX-4 becomes the signed value -5.

The only reason this doesn't happen in the first program because an unsigned int and a signed int can both represent 251, so converting between the two is well defined and using "%d" or "%u" doesn't matter. In the second program, however, it results in undefined behavior and becomes implementation specific since your value of UINT_MAX-4 went outside the range of an signed int.

What's happening under the hood

It's always good to double check what you think is happening or what should happen with what's actually happening, so let's look at the assembly language output from the compiler now to see exactly what's going on. Here's the meaningful part of the first program:

    mov     BYTE PTR [rbp-1], 0   ; a becomes 0
    mov     BYTE PTR [rbp-2], -5  ; b becomes -5, which as an unsigned char is also 251
    movzx   edx, BYTE PTR [rbp-1] ; promote a by zero-extending to an unsigned int, which is now 0
    movzx   eax, BYTE PTR [rbp-2] ; promote b by zero-extending to an unsigned int which is now 251
    add     eax, edx  ; add a and b, that is, 0 and 251

Notice that although we store a signed value of -5 in the byte b, when the compiler promotes it, it promotes it by zero-extending the number, meaning it's being interpreted as the unsigned value that 11111011 represents instead of the signed value. Then the promoted values are added together to become c. This is also why the C standard defines signed to unsigned conversions the way it does -- it's easy to implement the conversions on architectures that use two's complement for signed values.

Now with program 2:

    mov     BYTE PTR [rbp-1], 0 ; a = 0
    mov     BYTE PTR [rbp-2], 5 ; b = 5
    movzx   edx, BYTE PTR [rbp-1] ; a is promoted to 32-bit integer with value 0
    movzx   eax, BYTE PTR [rbp-2] ; b is promoted to a 32-bit integer with value 5
    mov     ecx, edx 
    sub     ecx, eax ; a - b is now done as 32-bit integers resulting in -5, which is '4294967291' when interpreted as unsigned

We see that a and b are once again promoted before any arithmetic, so we end up subtracting two unsigned ints, which leads to a UINT_MAX-4 due to underflow, which is also -5 as a signed value. So whether you interpret it as a signed or unsigned subtraction, due to the machine using two's complement form, the result matches the C standard without any extra conversions.

Thanks. However, I'm still wondering why a and b are promoted to unsigned int in the second case but not the first? — user678392, Sep 07 '11 at 03:26
They are still promoted in the first case. However, the value '-5' is already converted to 251 due to the unsigned char assignment in the first program, and an unsigned char of value 251 promoted to an unsigned int still has a value 251, so a+b is also 251. — James O'Doherty, Sep 07 '11 at 03:28
Actually, I might be way off here on the specifics, given that I didn't bother looking at the assembly code generated by the compiler, but what it all comes down to -5 as an unsigned char becomes 251 because both are `11111011`, but when promoted from an unsigned char to an unsigned int gets converted to `00000000 00000000 00000000 11111011` whereas -5 when promoted to an int changes from `11111011` to `11111111 11111111 11111111 11111011` (on my machine), which is -5 signed and 4294967291 unsigned. In other words, once it's become an unsigned type, promotion then extends 0s instead of 1s. — James O'Doherty, Sep 07 '11 at 03:36
Now I'm confused. When I subtracted a - b, why isn't that equivalent to 0 + (the bit pattern for 251)? If it is, then why do 1's get propogated instead of the zero's? — user678392, Sep 07 '11 at 03:49
The promotion comes first. With a - b, first a becomes `00000000 00000000 00000000 00000000` then b becomes `00000000 00000000 00000000 00000101` due to promotion. Then the subtraction occurs, leading to `11111111 11111111 11111111 11111011` due to underflow. — James O'Doherty, Sep 07 '11 at 03:53
This answer is wrong from the first sentence. What OP is seeing is a result of the *requirements of the C language*. Implementation details might show *how* a particular implementation achieves behavior that matches the standard, but they can't tell *why* that behavior exists. — R.. GitHub STOP HELPING ICE, Sep 07 '11 at 04:15
I've updated my answer to explain what's happening in terms of standard and undefined behavior. — James O'Doherty, Sep 07 '11 at 13:16
@R.. Depends on which version of the C language you're talking about. — Hot Licks, Sep 07 '11 at 15:22
@Daniel: No, it does not. The behavior has been prescribed ever since C was first standardized (by ANSI in 1989), and the de-facto standard behavior (K&R) before that was the same. Your insistence on restating blatantly wrong information again and again is getting close to the point of trolling; please stop. — R.. GitHub STOP HELPING ICE, Sep 07 '11 at 15:28
The "de-facto standard" behavior varied widely in this area, depending on which compiler you used. K&R never specified it (or much of anything). — Hot Licks, Sep 07 '11 at 15:31
Now we're talking about pre-standard implementations that are more than 22 years old, and which have no possible relevance. It's certainly possible that somebody made weird C-like languages with representation-based signed/unsigned conversions, but judging by the rarity of non-twos-complement systems even by that time, I think it's doubtful. You're welcome to show us one as a specimen of archaeological interest, but it has nothing to do with OP's question. — R.. GitHub STOP HELPING ICE, Sep 07 '11 at 15:35
@Daniel and @R, I think the significance of K&R is that their book, The C Programming Language, is still the goto guide for a lot of C programmers (relative to the number that have a copy of the ISO standards sitting around, anyway). Even in the second edition (which is based on ANSI C), the details about the C language's requirements concerning signed to unsigned conversion are only found in the standards draft in the Appendix (A6.2 Integral Conversions), so most will overlook it and assume conversions are based on the underlying signed integer representation, since any is allowed. — James O'Doherty, Sep 07 '11 at 21:57
Note that it took K&R about ten years to decide if chars should be signed or unsigned by default. — Hot Licks, Sep 07 '11 at 23:45

score -1 · Answer 5 · answered Sep 07 '11 at 02:49

-1

Assigning a negative number to an unsigned variable is basically breaking the rules. What you're doing is converting the negative number to a large positive number. You're not even guaranteed, technically, that the conversion is the same from one processor to another -- on a 1's complement system (if any still existed) you'd get a different value, eg.

So you get what you get. You can't expect signed algebra to still apply.

answered Sep 07 '11 at 02:49

Hot Licks

47,103
17
93
151

So basically you don't have an answer on why I am getting what I am getting. – user678392 Sep 07 '11 at 03:07
Write out the bits and you can figure it out yourself. – Hot Licks Sep 07 '11 at 03:31
I have written out the bits. The question comes down data type conversions, not mere bit patterns. – user678392 Sep 07 '11 at 03:50
This answer is absolutely wrong. The result of converting to an unsigned type is always well-defined; it's reduction modulo one plus the maximum value of the destination type, into the range of the destination type. – R.. GitHub STOP HELPING ICE Sep 07 '11 at 03:55
@R.. -- That's not true on an one's complement machine. And the range of the destination type isn't fixed. – Hot Licks Sep 07 '11 at 11:31
@user678392 -- There are no "data type conversions" -- the bit patterns are copied, bit for bit. The only difference is whether you get sign extension or not when copying to a wider value, and the answer is "not" if everything is unsigned. – Hot Licks Sep 07 '11 at 11:33
@Daniel: You're wrong. Signed to unsigned conversions are not bit copying. They are **value** conversions. RTFM. Per 6.3.1.3: "if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type." – R.. GitHub STOP HELPING ICE Sep 07 '11 at 13:44
And note that, per 6.2.6, one more than the maximum value that can be represented in a type is always a power of two. – R.. GitHub STOP HELPING ICE Sep 07 '11 at 13:46

Adding unsigned integers in C

PROGRAM 1:

PROGRAM 2:

5 Answers5