7

All of these functions gives the expected result on my machine. Do they all work on other platforms?

More specifically, if x has the bit representation 0xffffffff on 1's complement machines or 0x80000000 on signed magnitude machines what does the standard says about the representation of (unsigned)x ?

Also, I think the (unsigned) cast in v2, v2a, v3, v4 is redundant. Is this correct?

Assume sizeof(int) = 4 and CHAR_BIT = 8

int logicalrightshift_v1 (int x, int n) {

    return (unsigned)x >> n;
}

int logicalrightshift_v2 (int x, int n) {

    int msb = 0x4000000 << 1;
    return ((x & 0x7fffffff) >> n) | (x & msb ? (unsigned)0x80000000 >> n : 0);
}

int logicalrightshift_v2a (int x, int n) {

    return ((x & 0x7fffffff) >> n) | (x & (unsigned)0x80000000 ? (unsigned)0x80000000 >> n : 0);
}

int logicalrightshift_v3 (int x, int n) {

    return ((x & 0x7fffffff) >> n) | (x < 0 ? (unsigned)0x80000000 >> n : 0);
}

int logicalrightshift_v4 (int x, int n) {

    return ((x & 0x7fffffff) >> n) | (((unsigned)x & 0x80000000) >> n);
}

int logicalrightshift_v5 (int x, int n) {

    unsigned y;
    *(int *)&y = x;
    y >>= n;
    *(unsigned *)&x = y;
    return x;
}

int logicalrightshift_v6 (int x, int n) {

    unsigned y;
    memcpy (&y, &x, sizeof (x));
    y >>= n;
    memcpy (&x, &y, sizeof (x));
    return x;
}
CharlesB
  • 86,532
  • 28
  • 194
  • 218
tyty
  • 839
  • 5
  • 12
  • 1
    Why don't you just divide by (2^n) and let the compiler do the optimizing? – Mat Oct 28 '11 at 05:58
  • Assuming 2's complement: logicalrightshift (-1,1) should be 0x7fffffff, but -1 / 2 = 0 – tyty Oct 28 '11 at 06:21
  • @user1016492: If your machine has `UINT_MAX` equal to `0xffffffff`, then `(unsigned)-1 >> 1` is guaranteed to be `0x7fffffff`, regardless of what representation is used for signed numbers. – caf Oct 28 '11 at 06:57
  • @Mat: `2^n` is an exclusive or operation in C... – Dietrich Epp Oct 28 '11 at 07:02
  • 1
    @Deitrich: Perhaps that is why he did not use code markup. Using ^ for exponentiation when typographical limitations prevent superscript is common usage. In this case the reader has to apply context to disambiguate. I think Mat's comment was clear enough *in context*. – Clifford Oct 28 '11 at 07:40

2 Answers2

10

If x has the bit representation 0xffffffff on 1's complement machines or 0x80000000 on signed magnitude machines what does the standard says about the representation of (unsigned)x ?

The conversion to unsigned is specified in terms of values, not representations. If you convert -1 to unsigned, you always get UINT_MAX (so if your unsigned is 32 bits, you always get 4294967295). This happens regardless of the representation of signed numbers that your implementation uses.

Likewise, if you convert -0 to unsigned then you always get 0. -0 is numerically equal to 0.

Note that a ones complement or sign-magnitude implementation is not required to support negative zeroes; if it does not, then accessing such a representation causes the program to have undefined behaviour.

Going through your functions one-by-one:

int logicalrightshift_v1(int x, int n)
{
    return (unsigned)x >> n;
}

The result of this function for negative values of x will depend on UINT_MAX, and will further be implementation-defined if (unsigned)x >> n is not within the range of int. For example, logicalrightshift_v1(-1, 1) will return the value UINT_MAX / 2 regardless of what representation the machine uses for signed numbers.

int logicalrightshift_v2(int x, int n)
{
    int msb = 0x4000000 << 1;
    return ((x & 0x7fffffff) >> n) | (x & msb ? (unsigned)0x80000000 >> n : 0);
}

Almost everything about this is could be implementation-defined. Assuming that you are attempting to create a value in msb with 1 in the sign bit and zeroes in the value bits, you cannot do this portably by use of shifts - you can use ~INT_MAX, but this is allowed to have undefined behaviour on a sign-magnitude machine that does not allow negative zeroes, and is allowed to give an implementation-defined result on two's complement machines.

The types of 0x7fffffff and 0x80000000 will depend on the ranges of the various types, which will affect how other values in this expression are promoted.

int logicalrightshift_v2a(int x, int n)
{
    return ((x & 0x7fffffff) >> n) | (x & (unsigned)0x80000000 ? (unsigned)0x80000000 >> n : 0);
}

If you create an unsigned value that is not in the range of int (for example, given a 32bit int, values > 0x7fffffff) then the implicit conversion in the return statement produces an implementation-defined value. The same applies to v3 and v4.

int logicalrightshift_v5(int x, int n)
{
    unsigned y;
    *(int *)&y = x;
    y >>= n;
    *(unsigned *)&x = y;
    return x;
}

This is still implementation defined, because it is unspecified whether the sign bit in the representation of int corresponds to a value bit or a padding bit in the representation of unsigned. If it corresponds to a padding bit it could be a trap representation, in which case the behaviour is undefined.

int logicalrightshift_v6(int x, int n)
{
    unsigned y;
    memcpy (&y, &x, sizeof (x));
    y >>= n;
    memcpy (&x, &y, sizeof (x));
    return x;
}

The same comments applying to v5 apply to this.

Also, I think the (unsigned) cast in v2, v2a, v3, v4 is redundant. Is this correct?

It depends. As a hex constant, 0x80000000 will have type int if that value is within the range of int; otherwise unsigned if that value is within the range of unsigned; otherwise long if that value is within the range of long; otherwise unsigned long (because that value is within the minimum allowed range of unsigned long).

If you wish to ensure that it has unsigned type, then suffix the constant with a U, to 0x80000000U.


Summary:

  1. Converting a number greater than INT_MAX to int gives an implementation-defined result (or indeed, allows an implementation-defined signal to be raised).

  2. Converting an out-of-range number to unsigned is done by repeated addition or subtraction of UINT_MAX + 1, which means it depends on the mathematical value, not the representation.

  3. Inspecting a negative int representation as unsigned is not portable (positive int representations are OK, though).

  4. Generating a negative zero through use of bitwise operators and trying to use the resulting value is not portable.

If you want "logical shifts", then you should be using unsigned types everywhere. The signed types are designed for dealing with algorithms where the value is what matters, not the representation.

caf
  • 233,326
  • 40
  • 323
  • 462
  • @caf: I do not understand why `~INT_MAX` would give implementation define result on 2's complement. INT_MAX is an integer constant. The `~` operator flips all the the bits. Wouldn't the result be well defined even if there are padding bits? For that matter, would it also be well defined on 1's complement machines. – tyty Oct 28 '11 at 09:01
  • @user1016492: `INT_MAX` has the sign bit 0 and all the value bits 1, so `~INT_MAX` has the sign bit 1 and all the value bits 0. This is not required to represent a normal value on 2s complement implementations, which meants that `~INT_MAX` can be out of range. That actually makes it undefined behaviour rather than implementation defined. – caf Oct 28 '11 at 09:19
  • 2
    Reference: 6.2.6.2/2: "implementation-defined ... whether the value with sign bit 1 and all value bits zero (for the first two) ... is a trap representation or a normal value". "For the first two" includes two's complement, which is the second of three listed. So it's one of those cases where it is implementation-defined whether the behaviour is undefined: an implementation can do anything at all, provided it has documented that `~INT_MAX` is a trap representation. But if it documented that it's a normal value, behavior is defined. A strictly-conforming program therefore cannot use it. – Steve Jessop Oct 28 '11 at 09:33
  • @caf some clarification on aliasing: doesn't v5 fall under the category of: "An object shall have its stored value accessed only by an lvalue expression that has one of the following types: ..., **a type that is the signed or unsigned type corresponding to the effective type of the object**, ..." – tyty Oct 29 '11 at 09:12
  • @user1016492: You are right, so v5 operates exactly as v6 does (I have corrected the answer). – caf Oct 29 '11 at 22:49
2

If you follow the standard to the word, none of these are guaranteed to be the same on all platforms.

In v5, you violate strict-aliasing, which is undefined behavior.

In v2 - v4, you have signed right-shift, which is implementation defined. (see comments for more details)

In v1, you have signed to unsigned cast, which is implementation defined when the number is out of range.

EDIT:

v6 might actually work given the following assumptions:

  • 'int' is either 2's or 1's complement.
  • unsigned and int are exactly the same size (in both bytes and bits, and are densely packed).
  • The endian of unsigned matches that of int.
  • The padding and bit-layout is the same: (See caf's comment for more details.)
Mysticial
  • 464,885
  • 45
  • 335
  • 332
  • isn't right shift of a signed int with non-negative value well defined ? – tyty Oct 28 '11 at 05:52
  • That's because you're dereferencing two pointers of different types that point to the same memory location. (`char*` is exempt from this) – Mysticial Oct 28 '11 at 05:53
  • I think you're right about the non-negative right-shift. However, the other problem is that you're assuming `int` is 32-bits. Which isn't the case on all systems. – Mysticial Oct 28 '11 at 05:54
  • Thanks, the assumption about 32 bit platform is added – tyty Oct 28 '11 at 05:56
  • With that assumption added, then *maybe*. I'm don't have the standard memorized or in front of me. – Mysticial Oct 28 '11 at 05:59
  • 2
    With regard to your assumptions in the last part, the value bits of the `int` representation are required to match up with the corresponding value bits of `unsigned`. The only problems are that the sign bit need not correspond to another value bit in the `unsigned` representation, and any padding bits in the `int` representation can be additional value bits in the `unsigned` representation. This means that a negative `n` could give rise to trap representaions, and a positive `n` could give an implementation-defined result. – caf Oct 28 '11 at 06:54
  • @caf: Good point, I wrote that also assuming 'int' is either 2's or 1's complement as the OP had stated. But your point on the padding still holds. I'll edit my answer to point to your comment. – Mysticial Oct 28 '11 at 06:59
  • @Mysticial: Even ones' and two's complement representations have sign bits, so I'm not sure how that helps? – caf Oct 28 '11 at 07:20
  • @caf: Hmm... If there's no padding, and the sign-bit is in the same position. Then I can't think of anymore leeway for it to not work. (maybe I'm wrong). But that's a lot of `if`s... – Mysticial Oct 28 '11 at 07:26
  • Yes, if there's no padding bits then that is a sufficient condition for it to work, with the result now depending only on the size of `unsigned` / `int` and what signed representation is in use. – caf Oct 28 '11 at 07:39
  • @caf: to clarify the point about positive `n` could give an implementation-defined result, are you saying if there are padding bits, the value of these padding bits are undefined. So if they are copied verbatim to unsigned, one or more of these undefined padding bits could end up as a value bit of unsigned and therefore give an implementation defined result? – tyty Oct 28 '11 at 08:14
  • @user1016492: Yes, exactly. You could use `& INT_MAX` to zero out those bits after the conversion, though (but this would zero out the value-bit-that-was-a-sign bit too). – caf Oct 28 '11 at 08:38