3

According to the standard, whether char is signed or not is implementation-defined. This has caused me some trouble. Following are some examples:

1) Testing the most significant bit. If char is signed, I could simply compare the value against 0. If unsigned, I compare the value against 128 instead. Neither of the two simple methods is generic and applies to both cases. In order to write portable code, it seems that I have to manipulate the bits directly, which is not neat.

2) Value assignment. Sometimes, I need to write a bit pattern to a char value. If char is unsigned, this can be done easily using hexadecimal notation, e.g., char c = 0xff. But this method does not apply when char is signed. Take char c = 0xff for example. 0xff is beyond the the maximum value a signed char can hold. In such cases, the standard says the resulting value of c is implementation-defined.

So, does anybody have good ideas about the these two issues? With respect to the second one, I'm wondering whether char c = '\xff' is OK for both signed and unsigned char.

NOTE: It is sometimes needed to write explicit bit patterns to characters. See the example in http://en.cppreference.com/w/cpp/string/multibyte/mbsrtowcs.

Lingxi
  • 14,579
  • 2
  • 37
  • 93
  • 3
    If you care about the bit patterns, perhaps you should always be using `unsigned char`. – Brian Bi Mar 26 '15 at 02:31
  • I only ever use char for characters or because I have to (streams etc) If I ever want some numbers, which just happen to be a byte in size, I always use an explicit signedness char. – Neil Kirk Mar 26 '15 at 02:32
  • Sometimes, what is needed is a character string. However, the values of the characters should be explicitly specified. See the example in http://en.cppreference.com/w/cpp/string/multibyte/mbsrtowcs. – Lingxi Mar 26 '15 at 02:41
  • testing MSB: `(x | 0x7F) != 0x7F` – Tony Delroy Mar 26 '15 at 02:46
  • I don't really see your reasoning for testing MSB. The MSB is the same bit for unsigned and signed. – Radiodef Mar 26 '15 at 03:32
  • "Take char c = 0xff for example. 0xff is beyond the the maximum value a signed char can hold. " - only if `CHAR_BIT==8`. If you're aiming for portable code, don't replace one assumption by another. – MSalters Mar 26 '15 at 09:37
  • I think there is enough evidence to practically reckon that `char c = '\xff'` (and similarly `char str[] = "\xff\xff"`) works for both signed and unsigned `char`. The evidence I found is as follows: 1) The table in http://en.cppreference.com/w/cpp/language/escape states that the representation of `'\xnn'` is `byte nn`. Note the use of the word `byte`. 2) The example in en.cppreference.com/w/cpp/string/multibyte/mbsrtowcs uses this. 3) The example in https://google-styleguide.googlecode.com/svn/trunk/cppguide.html#Non-ASCII_Characters uses this. – Lingxi Mar 26 '15 at 13:12

5 Answers5

2

1) testing MSB: (x | 0x7F) != 0x7F (or reinterpret_cast<unsigned char&>(x) & 0x80)

2) reinterpret_cast<unsigned char&>(x) = 0xFF;

Note that reinterpret_cast is entirely appropriate if you want to treat the memory the character occupies as a collection of bits, bypassing the specific bit patterns associated with any given value in the char type.

Tony Delroy
  • 102,968
  • 15
  • 177
  • 252
  • For 1) I think `(unsigned char)c >= 128` is cleaner. For 2), how about a character string? I need to explicitly specify the bit patterns of the characters in the string as in the example of en.cppreference.com/w/cpp/string/multibyte/mbsrtowcs – Lingxi Mar 26 '15 at 04:42
  • For 1), `if (c & 0x80)` seems to be the cleanest. It is correct regardless of whether `c` is promoted as signed or unsigned. – Lingxi Mar 26 '15 at 04:50
  • @Lingxi: the reason I used 0x7F is because it's self-evidently correct... if you even have to ask whether `c & 0x80` is also correct, then you shouldn't use it because - at best - every programmer looking at your code later is likely to wonder the same thing. – Tony Delroy Mar 26 '15 at 05:18
  • That said, the worrying case is when `char` is signed, and the relevant passage is from 5/10 "if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank shall be converted to the type of the operand with greater rank.", so the character `c` is promoted to the `signed int` type of 0x80. If the system happens to use twos-complement integer representations, then the bit at 0x80 will stay set. If it uses a sign/magnitude representation, the sign bit will be moved to the most significant bit. So, it's unreliable. – Tony Delroy Mar 26 '15 at 05:29
1

If you really care about the signed-ness, just declare the variable as signed char or unsigned char as needed. No platform-independent bit-twiddling tricks required.

user207421
  • 305,947
  • 44
  • 307
  • 483
0

Actually you can do what you want without worrying about signedness.

Hexadecimal describes bit pattern not the integral value. (see disclaimer)

So for 2. you said you can't assign bit patterns like this

char c = 0xff

but you realy can do that, signed or not.

For 1, you may not be able to do the "compare with 0" trick, but you stil have several ways to check the most significant bit. One way is, shift to the right 7, shifting in zero's on the left, and then check if it's equal to 1. Independent of signedness.

As Tony D pointed out, (x | 0x7F) != 0x7F is a more portable way of doing it instead of shifting because it may not shift in zeros. Similarily, you could do x & 0x80 == 0x80.

Of course you can also do what Brian suggested and just use an unsigned char.

Disclaimer: Tony pointed out that 0x is actually an int and the conversion to char is implementation defined when the char can't hold the value or if the char is unsigned. However, no implementation is going to break the standard here. char c = 0xFF, weather or unsigned or not, will fill the bits, trust me. It will be extremely difficult to find an implementation that doesn't do that.

Thomas
  • 6,032
  • 6
  • 41
  • 79
  • *"you can still shift to the right 7 and then check if it's equal to 1"* - the Standard leaves right shift of negative values implementation defined, so there's no guarantee that will work portably (5.8/3 "If E1 has a signed type and a negative value, the resulting value is implementation-defined.") – Tony Delroy Mar 26 '15 at 02:52
  • `0xff ` is of type `int`. See http://en.cppreference.com/w/cpp/language/integer_literal. – Lingxi Mar 26 '15 at 02:52
  • I think assigning from `int` to `char` comes under 4.7/3 "If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is ***implementation-defined***." – Tony Delroy Mar 26 '15 at 02:56
  • `char c = 0xFF` is implementation-defined if `char` is signed – M.M Mar 26 '15 at 07:04
0

You can OR and AND the given value with the two 0x7F and 0xFF respectively to detect as well as remove its signed_ness.

Vul
  • 44
  • 3
0

Easiest way to test the MSB is to make it the LSB: char c = foo(); if ((c>>(CHAR_BIT-1)) & 1) ....

Setting a specific bitpattern is a bit more tricky. All-bits-one for instance may not necessarily be 0xff but could also be 0x7ff, ore more realistically 0xffff. Regardless, ~char(0) is all-bits-one. Somewhat less obvious, so is char(-1). If char is signed, that's clear; if unsigned this is still correct because unsigned type work modulo 2^N. Following that logic, char(-128) sets just the 8 bit regardless of how many bits there are in the char or whether it's signed.

MSalters
  • 173,980
  • 10
  • 155
  • 350