Truncating an int to char - is it defined?

Question

unsigned char a, b;
b = something();
a = ~b;

A static analyzer complained of truncation in the last line, presumably because b is promoted to int before its bits are flipped and the result will be of type int.

I am only interested in the last byte of the promoted int - if b was 0x55, I need a to be 0xAA. My question is, does the C spec say anything about how the truncation happens, or is it implementation defined/undefined? Is it guaranteed that a will always get assigned the value I expect or could it go wrong on a conforming platform?

Of course, casting the result before assigning will silence the static analyzer, but I want to know if it is safe to ignore this warning in the first place.

I would say that is a spurious warning. I have just run your code through the clang static analyser and it did not complain. What is the return type of `something()` — JeremyP, May 04 '11 at 10:51
@Jeremy this a sample code to illustrate the scenario. The real code is something like `mask1[0] = ~mask2[0];` where both are arrays of type unsigned char. Apparently, my static analyzer isn't as smart as clang :) — Amarghosh, May 04 '11 at 10:55

score 11 · Answer 1 · answered May 04 '11 at 10:30

11

The C standard specifies this for unsigned types:

A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.

In this case, if your unsigned char is 8 bits, it means that the result will be reduced modulo 256, which means that if b was 0x55, a will indeed end up as 0xAA.

But note that if unsigned char is wider than 8 bits (which is perfectly legal), you will get a different result. To ensure that you will portably get 0xAA as the result, you can use:

a = ~b & 0xff;

(The bitwise and should be optimised out on platforms where unsigned char is 8 bits).

Note also that if you use a signed type, the result is implementation-defined.

answered May 04 '11 at 10:30

caf

233,326
40
323
462

Its not overflow that I am worried about (as the operator is `~` here) - can truncation of 0xFFFFFFAA to a char result in say 0xFF (msb) instead of 0xAA (lsb) ? – Amarghosh May 04 '11 at 10:38
No, the reduction is always a "modulo 2^n" operation, where n is the number of bits in a char. Endianness doesn't matter – Gunther Piez May 04 '11 at 10:51
@Amarghosh No it will use the least significant byte(s) regarldless of endianess. – Klas Lindbäck May 04 '11 at 10:52
@Amarghosh: The part of the quote from _"because a result that cannot be represented..."_ is still relevant. Since the value `0xFFFFFFAA` cannot be represented in an 8-bit `unsigned char`, it will be reduced modulo 256 - which results in `0xAA`. – caf May 04 '11 at 10:54
1

Using `uint8_t` doesn't really help because it cannot exist unless `CHAR_BIT` is 8. Just using `#if` and `#error` would work just as well. – R.. GitHub STOP HELPING ICE May 04 '11 at 13:08

score 7 · Accepted Answer · answered May 04 '11 at 10:41

The truncation happens as described in 6.3.1.3/2 of the C99 Standard

... if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

Example for CHAR_BIT == 8, sizeof (unsigned char) == 1, sizeof (int) == 4

So, 0x55 is converted to int, to 0x00000055, then negated to 0xFFFFFFAA, and

      0xFFFFFFAA
    + 0x00000100 /* UCHAR_MAX + 1 */
    ------------
      0xFFFFFEAA

    ... repeat lots and lots of times ...

      0x000000AA

or, as plain 0xAA, as you'd expect

Errrr ... the example is rather wrong LOL, but you get the idea :) — pmg, May 04 '11 at 10:56

score 1 · Answer 3 · answered May 04 '11 at 10:23

1

It will behave as you want it to. It is safe to cast the value.

answered May 04 '11 at 10:23

Klas Lindbäck

33,105
5
57
82

Mayank · Answer 4 · 2011-05-04T11:40:58.907

0

Lets take the case of Win32 machine.
Integer is 4 bytes and converting it to a char will result exactly as if left 3 bytes have been removed.

As you are converting a char to char, it doesn't matter to what is it being promoted to.
~b will add 3 bytes at the left change 0s to 1 and then remove... It does not affect your one right byte.

The same concept will be applicable for different architectures (be it 16 bit or 64 bit machine)

Assuming it to be little-endian

edited May 04 '11 at 11:40

answered May 04 '11 at 10:27

Mayank

5,454
9
37
60

I'm trying to write the concept here... Assume it as 32 bit machine... 64 bit machines or a different architeture will not make any difference as far as the concept is concerned.3 – Mayank May 04 '11 at 10:45
1

@Mayank: for the sake of other SO users who might read your answer in the future it's important that it does not contain misinformation – Paul R May 04 '11 at 10:46
@Paul: Thanks for the comment. I'll take care of this in future – Mayank May 04 '11 at 10:48
@Mayank: you could probably edit the above answer to make it more accurate and generally applicable – Paul R May 04 '11 at 10:51
@Mayank: that's a little better - I would remove the last line about little endianness though, as it's irrelevant. – Paul R May 04 '11 at 11:03
@Paul: I mentioned endian-ness to make **left** and **right** stuff mentioned in the answer more sense. – Mayank May 04 '11 at 11:46

score 0 · Answer 5 · answered May 04 '11 at 11:06

This particular code example is safe. But there are reasons to warn against lax use of the ~ operator.

The reason behind this is that ~ on small integer variables is a potential bug in more complex expressions, because of the implicit integer promotions in C. Imagine if you had an expression like

a = ~b >> 4;

It will not shift in zeroes as might have been expected.

If your static analyzer is set to include MISRA-C, you will for example get this warning for each ~ operator, because MISRA enforces the result of any operation on small integer types to be explicitly typecasted into the expected type, unsigned char in this case.

Truncating an int to char - is it defined?

5 Answers5

Linked