5

I'm sorry if this question is too basic...I just have not found the answer to it anywhere.

Say I declare a C variable like this:

unsigned int var = 241;

In this case the var is unsigned so my intention is for it to have decimal value 241.

Alternatively I could declare it like this:

signed int var = -15;

In this case I declared it as signed integer so, as per my understanding it should have the decimal value -15.

However both times, I assume the var will be declared in memory(hardware) like this: 1111 0001.

So how does the processor know, at the lowest level which is in the hardware that I intended to declare this as 241 or -15? I'm aware of the two's complement notation that is used to represent negative numbers and such but, I assume in hardware the processor only sees a sequence of ones and zeroes and then does some operation with it by switching the states of some ICs. How does the processor know whether to interpret the sequence of bits in standard binary(for unsigned) or 2's complement(for signed)?

Also another somewhat unrelated questions:

  1. In C I can do this:

    unsigned int var = -15; printf("The var is: %d ", var); This will as expected print -15. Why, when I do this:

signed int var = 0xF1; //or 0b11110001 printf("The var is: %d ", var);

I get 241 instead of -15? Since I declared it as signed and in two's complement 0xF1 is -15 why am I getting the value 241 which is the equivalent of 0xF1 in standard binary?

  1. Why does the compiler let me do stuff like: unsigned int var = -15;

Shouldn't it throw an error telling me I can't assign negative values to a variable which I have declared as unsigned?

Thank you and I apologize for my many and perhaps basic questions, there is so much I do not know :D.

Yunnosch
  • 26,130
  • 9
  • 42
  • 54
Cantaff0rd
  • 705
  • 1
  • 6
  • 14
  • 1
    *Why does the compiler let me do stuff like: `unsigned int var = -15;`* Because C has rules for implicit conversion of out-of-range (negative or large) integer to unsigned types: modulo reduction into the value-range. (On a 2's complement machine, this means using the signed bit-pattern unchanged for runtime variables, but not on a 1's complement or sign/magnitude C implementation.) – Peter Cordes Feb 13 '21 at 08:16
  • '*how does the processor know*" It doesn't. All the processor knows about that value is whatever bits are stored in memory. But the compiler may (and often will) generate different instructions depending on whether you declared the variable as signed vs. unsigned. – dxiv Feb 13 '21 at 08:19
  • @dxiv Sorry, I was editing already when your comment appeared. I hope you accept my answer as not being based on your very similar contribution. – Yunnosch Feb 13 '21 at 08:26
  • 1
    @Yunnosch Of course, mine was just a summary comment, yours is an answer proper (+1). – dxiv Feb 13 '21 at 08:30

2 Answers2

8

The hardware does not know.
The compiler knows.
The compiler knows because you said so here signed int var = -15;, "This, dear compiler, is a variable which can be negative and I init it to a negative value."
Here you said differently unsigned int var = 241;, "This, dear compiler, is a variable which cannot be negative and I init it to a positive value."

The compiler will keep that in mind for anything you later do with the variable and its values. The compiler will turn all corresponding code into that set of instructions in machine language, which will cause the hardware to behave accordingly. So the hardware ends up doing things appropriate to negative or not; not because of knowing, but because of not getting a choice on it.

An interesting aspect of "corresponding instructions" (as pointed out by Peter Cordes in a comment below) is the fact that for the special (but very widely used) case of 2-complement representation of negative values, the instructions are actually identical for both (which is an important advantage of 2-complement).

Yunnosch
  • 26,130
  • 9
  • 42
  • 54
  • 2
    The beauty of 2's complement is that addition / subtraction are the same binary operation as for unsigned; same for non-widening multiply. Only one's complement or sign/magnitude machines need different instructions for signed/unsigned basic math ops. (2's complement machines just need different compare and/or branch instructions). – Peter Cordes Feb 13 '21 at 08:48
  • @PeterCordes True. I think I do not contradict that in my answer. Otherwise feel free to point out where. Maybe " that set of instructions ... accordingly"? But that does not necessarily mean "different for negative/positive or signed/unsigned". In the end it might only be things in e.g. `printf()`. – Yunnosch Feb 13 '21 at 08:53
  • 1
    I did however mention it explicitly now, because I am a fan of that attribute of 2-complement myself. @PeterCordes – Yunnosch Feb 13 '21 at 08:57
  • Yup, good update. Don't want people getting the misconception that normal computers have a `signed-add` instruction. Only stuff like `x < y` depends on signed vs. unsigned, which is certainly not rare but not as common as `x += y`. – Peter Cordes Feb 13 '21 at 09:03
  • _Nb_ ISO C does not say one way or another what the scheme for representing negative numbers should be, only the lower limits of the representation. This contrasts with many newer languages, _eg_, Java. This is important sometimes. – Neil Feb 13 '21 at 22:59
  • 1
    @Neil: IIRC, as well as the limits, ISO C does say somewhere that the 3 options are sign/mag, one's complement, and two's complement. (Somewhere in the object-representation details that put some limits on what you will see when you use `unsigned char*` to access the object-representation of other types. e.g. unsigned integer types have to be binary but can have padding. And signed integers can have padding and a sign bit.) – Peter Cordes Feb 14 '21 at 01:20
5

If the two values were char (signed or not), then their internal representation (8-bit pattern) would be the same in memory or register. The only difference would be in the instructions the compiler emits when dealing with such values. For example, if these values are stored in variables declared signed or unsigned in C, then a comparison between such values would make the compiler generate a signed or unsigned specific comparison instruction at assembly level.

But in your example you use ints. Assuming that on your platform these ints use four bytes, then the two constants you gave are not identical when it comes to their 32-bit pattern. The higher bits take in consideration the sign of the value and propagate to fill with 0 or 1 up to 32-bits (see the sequences of 0 or f below).

Note that assigning a negative value to an unsigned int produces a warning at compilation if you use the proper compiler flags (-Wconversion for example). In his comment below, @PeterCordes reminds us that such an assignment is legal in C, and useful in some situations; the usage (or not) of compiler flags to detect (or not) such cases is only a matter of personal choice. However, assigning -15U instead of -15 makes explicit the intention to consider the constant as unsigned (despite the minus sign), and does not trigger the warning.

int i1=-15;
int i2=0xF1;
int i3=241;
printf("%.8x %d\n", i1, i1); // fffffff1 -15
printf("%.8x %d\n", i2, i2); // 000000f1 241
printf("%.8x %d\n", i3, i3); // 000000f1 241
unsigned int u1=-15; // warning: unsigned conversion from ‘int’ to ‘unsigned int’ changes value from ‘-15’ to ‘4294967281’
unsigned int u2=0xF1;
unsigned int u3=241;
printf("%.8x %u\n", u1, u1); // fffffff1 4294967281
printf("%.8x %u\n", u2, u2); // 000000f1 241
printf("%.8x %u\n", u3, u3); // 000000f1 241
prog-fh
  • 13,492
  • 1
  • 15
  • 30
  • 2
    I guess that conversion warning is from MSVC? GCC and clang don't warn about it because it's perfectly legal C, and `-16U` or `-1U` are useful ways some bit-patterns, e.g. `x & -16U` rounds down to a multiple of 16. So it would be annoying to get warnings about it in contexts like that. https://godbolt.org/z/vM64df. But MSVC's `-Wall` enables a bunch of warnings including ones that are often spurious / false-positive, so that's fine and potentially useful and a good fit for that. (I was hoping GCC or clang would have a warning for it at `-Wpedantic` or something, but I didn't find one.) – Peter Cordes Feb 13 '21 at 08:45
  • @PeterCordes Yes, I always use many `-W... -pedantic` flags in order to detect such potential errors (although considered correct in some situations you gave). I prefer using casts where such conversions are correct to silencing the compiler warning, but it is just a personal choice. – prog-fh Feb 13 '21 at 08:51
  • 2
    Note that in formal C terms, `-15U` isn't ever a negative number. As MSVC warns (https://godbolt.org/z/nf9x8M): "*C4146: unary minus operator applied to unsigned type, result still unsigned*". So it's exactly identical to `(0U - 15U)`. C numeric-literals don't include minus signs; that's why `-0x80000000` has type `unsigned` in 32-bit-int C implementations: 0x80000000 doesn't fit in a signed 32-bit int, so it promotes to unsigned, *then* unary `-` is applied. https://godbolt.org/z/eW7bsx. This means you normally need a cast back to signed or something like that to define `INT_MIN`. – Peter Cordes Feb 13 '21 at 21:29