1

starting with a pseudo-code snippet:

char a = 0x80;
unsigned short b;
b = (unsigned short)a;
printf ("0x%04x\r\n", b); // => 0xff80

to my current understanding "char" is by definition neither a signed char nor an unsigned char but sort of a third type of signedness.

how does it come that it happens that 'a' is first sign extended from (maybe platform dependent) an 8 bits storage to (a maybe again platform specific) 16 bits of a signed short and then converted to an unsigned short?

is there a c standard that determines the order of expansion?

does this standard guide in any way on how to deal with those third type of signedness that a "pure" char (i called it once an X-char, x for undetermined signedness) so that results are at least deterministic?

PS: if inserting an "(unsigned char)" statement in front of the 'a' in the assignment line, then the result in the printing line is indeed changed to 0x0080. thus only two type casts in a row will provide what might be the intended result for certain intentions.

Alexander Stohr
  • 159
  • 1
  • 18
  • How can a `char` be neither `signed` nor `unsigned`? That doesn't make sense. – Fiddling Bits Nov 13 '18 at 13:41
  • [INT02-C. Understand integer conversion rules](https://wiki.sei.cmu.edu/confluence/display/c/INT02-C.+Understand+integer+conversion+rules)? – Swordfish Nov 13 '18 at 13:43
  • 1
    why not just use `signed char` vs. `unsigned char` when you're dealing with numbers, and leave the use of `char` for when you're dealing with characters ? – Sander De Dycker Nov 13 '18 at 13:51
  • 2
    @FiddlingBits -- `char`, `signed char`, and `unsigned char` are distinct types according to the Standard, so the confusion is understandable. Portable code can't treat a bare `char` as either `signed` or `unsigned` unless that implementation detail is known. – ad absurdum Nov 13 '18 at 13:58
  • @David: thanks for the wording "portable code" - thats what i really meant. if you write code for portability then you have to deal with /undecorated/ char as a "third type" - even if its compatible with one of the two other variants - but you will not know at code writing time, only at compile time or later. – Alexander Stohr Nov 13 '18 at 14:03

2 Answers2

6

The type char is not a "third" signedness. It is either signed char or unsigned char, and which one it is is implementation defined.

This is dictated by section 6.2.5p15 of the C standard:

The three types char , signed char , and unsigned char are collectively called the character types. The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char.

It appears that on your implementation, char is the same as signed char, so because the value is negative and because the destination type is unsigned it must be converted.

Section 6.3.1.3 dictates how conversion between integer types occur:

1 When a value with integer type is converted to another integer type other than _Bool ,if the value can be represented by the new type, it is unchanged.

2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

Since the value 0x80 == -128 cannot be represented in an unsigned short the conversion in paragraph 2 occurs.

dbush
  • 205,898
  • 23
  • 218
  • 273
  • convert -128 to unsinged short => -128 + (USHRT_MAX + 1) = (65535 + 1) - 128 = 65408 = 0xff80 (real world implementation might use different math & logic) – Alexander Stohr Nov 13 '18 at 14:00
  • There is no sign extension anywhere. The conversion to a larger type is a lvalue conversion upon assignment. You quote the correct part of the standard but this is not sign extension. Sign extension is `char c = -128; int i = c; printf("%x", i);` giving 0xffffff80 rather than 0x80. – Lundin Nov 13 '18 at 14:11
  • To clarify, "It appears that on your implementation, char is the same as signed char, so sign extension happens when the value is cast to a larger type." is only true when the value is cast to a larger _signed_ type. This is not the case here, so this part is incorrect. – Lundin Nov 13 '18 at 14:19
  • @Lundin Nice clarification. While on a two's compliment implementation the effect of it the same, it's not necessarily true in the general case. Updated to reflect. – dbush Nov 13 '18 at 14:33
2

char has implementation-defined signedness. It is either signed or unsigned, depending on compiler. It is true, in a way, that char is a third character type, see this. char has an indeterministic (non-portable) signedness and therefore should never be used for storing raw numbers.

But that doesn't matter in this case.

  • On your compiler, char is signed.
  • char a = 0x80; forces a conversion from the type of 0x80, which is int, to char, in a compiler-specific manner. Normally on 2's complement systems, that will mean that the char gets the value -128, as seems to be the case here.
  • b = (unsigned short)a; forces a conversion from char to unsigned short 1). C17 6.3.1.3 Signed and unsigned integers then says:

    Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

    One more than the maximum value would be 65536. So you can think of this as -128 + 65536 = 65408.

  • The unsigned hex representation of 65408 is 0xFF80. No sign extension takes place anywhere!


1) The cast is not needed. When both operands of = are arithmetic types, as in this case, the right operand is implicitly converted to the type of the right operand (C17 6.5.16.1 §2).

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • some compilers further depending on their warning level settings will warn on the type conversion - because its sometimes a signed to unsigned conversion that means if the sign bit is set to negative the result will still be positive and thus something undesired has the chance to happen. – Alexander Stohr Nov 13 '18 at 14:38
  • 1
    In your footnote, I think you meant that the *right*hand operand of an `=` operator is automatically converted to the type of the *left*hand operand. – John Bollinger Nov 13 '18 at 14:43
  • @AlexanderStohr Yeah, such conversions are often fishy indeed. On gcc I belive you have to explicitly use `-Wconversion` to get the warnings. – Lundin Nov 13 '18 at 15:31
  • @JohnBollinger Yeah of course, a typo. Fixed - thanks! – Lundin Nov 13 '18 at 15:31