5

I'm reading through an email argument regarding the following line of code:

p = (unsigned char)random();

The random function returns a long, and somebody says that this is unsafe because it's possible that the typecast might take the MSB instead of the LSB. I know that on x86 the typecast would return LSB, but I can't find any information as to whether this is actually mandated by ANSI C or if it's one of those implementation-specific "undefined behaviors".

Thomas Dickey
  • 51,086
  • 7
  • 70
  • 105
NapoleonBlownapart
  • 309
  • 1
  • 2
  • 8
  • 1
    The question is interesting and the answer worth knowing, but in real code, one should strive to remove confusion. In other words, if you change the code to this, there's no question about what byte is used: `p = (unsigned char) (random() % 256)`. This will prevent any future reader of your code having to wonder the same thing. As a general rule, you shouldn't write code that relies on detailed knowledge of standards particulars to understand. – Gort the Robot Aug 07 '13 at 00:21
  • In fact the cast is unnecessary; you can assign any numeric type to any other numeric type, and it will be implicitly converted as if by a cast. – Keith Thompson Aug 07 '13 at 01:01
  • @KeithThompson: compilers may warn if a potentially lossy conversion is used without a cast, though... – Christoph Aug 07 '13 at 07:38

1 Answers1

6

This is specified in the C Standard.

C99 in 6.3.1.3p2 says:

"Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type."

On a two's complement system, it means taking the least significant bits.

ouah
  • 142,963
  • 15
  • 272
  • 331
  • But for conversion to a *signed* type, if the target type can't represent the value the result is implementation-defined (or it can raise an implementation-defined signal, but I don't know of any compiler that does that). – Keith Thompson Aug 07 '13 at 01:02
  • @KeithThompson: How precisely does an implementation's documentation have to describe the behavior? Could an implementation legitimately specify that directly assigning to a 16-bit signed type a value N outside the range -32768..32767 will store something that will behave as N+65536*__INDETERMINATE_VALUE, but that an explicit typecast to a signed 16-bit type will store the canonical representation? – supercat Apr 20 '15 at 22:16
  • @supercat: In either case, there's a conversion; a cast (not "typecast") just specifies the same conversion that would have been done implicitly in the absence of the cast. I'm assuming something like `int n = too_big;` vs. `int n = (int)too_big;`. In any *sane* implementation, both conversions will yield the same result, and I suspect the standard committee assumed that to be the case. But I guess the wording does leave some wiggle room: "... either the result is implementation-defined or an implementation-defined signal is raised." (N1570 6.3.1.3p3) – Keith Thompson Apr 20 '15 at 22:36
  • @KeithThompson: My question is whether "implementation-defined" requires the documentation to provide sufficient information to nail down the result precisely. On a 32-bit platform, truncating an oversized variable when assigning it to a `short` which happens to be held in a register may be more expensive than would be storing the value directly, so it could be semantically useful to distinguish "I want to store a value wrapped to -32768..32767" versus "I want to store a value and am willing to accept a non-truncated store". – supercat Apr 20 '15 at 22:39
  • @supercat: I'd say yes -- but it doesn't seem to require it to be *consistent*. Compilers are allowed to do a lot of crazy things. But I don't think an "implementation-defined result" extends to letting a `short` object appear to have a value outside the range `SHRT_MIN` .. `SHRT_MAX`. – Keith Thompson Apr 20 '15 at 22:46
  • @KeithThompson: I wish someone would standardize a C-like language that better let programs specify what they actually want from their integer types. On many processors, `int32_t` is much more efficient than `int16_t` for variables held in registers, but `int16_t` is more efficient for variables held in RAM. There should be some way of saying "Give me the cheapest thing that's at least 16 bits, but I won't mind if it's actually 32". I find it curious that compiler developers do weird and wacky things with Undefined Behavior, but can't let programmers specify what their code requires. – supercat Apr 20 '15 at 22:57