5

The following way of checking for the signed number representation checks for twos complement correctly on my machine, but I dont have ones complement or signed magnitude machines to check it. Would the code work properly and more importantly, is it portable?

File: platform.h

#ifndef PLATFORM_H
#define PLATFORM_H
#include <limits.h>

static
const union {
    signed char sc;
    unsigned char uc;
} plat_4xvYw = {.sc = -1};

#define IS_TWOS_COMPL (plat_4xvYw.uc == UCHAR_MAX)
#define IS_ONES_COMPL (plat_4xvYw.uc == UCHAR_MAX - 1)
#define IS_SIGNED_MAG (plat_4xvYw.uc == (1U << (CHAR_BIT - 1)) + 1U)

#endif

File: a.c

#include <inttypes.h>
#include <limits.h>
#include "platform.h"
#include <assert.h>

int
main (void) {

    assert (IS_TWOS_COMPL);
    if (IS_TWOS_COMPL) {

        printf ("twos complement\n");
    } else if (IS_ONES_COMPL) {

        printf ("ones complement\n");
    } else if (IS_SIGNED_MAG) {

        printf ("signed magnitude\n");
    }
    return 0;
}
Sam
  • 7,252
  • 16
  • 46
  • 65
tyty
  • 839
  • 5
  • 12
  • 3
    You will have to work very hard to find a computer that doesn't use two's complement. I doubt any computer made outside of academia the last 20-30 years has been made without it. – Some programmer dude Nov 09 '11 at 12:18
  • I'm not 100%, but I believe that storing a signed int and accessing it as an unsigned int is implementation-defined or undefined behavior. See C99 6.5 §4 regarding bit-wise operators: "These operators yield values that depend on the internal representations of integers, and have implementation-defined and undefined aspects for signed types." As I understand it, the same must apply to this union. – Lundin Nov 09 '11 at 12:40
  • @Lundin: you can access it as `unsigned` (it's one of the types allowed by the strict aliasing rules), but unless the value of the `int` is representable as `unsigned` (i.e. not negative), there's no guarantee that it isn't a trap value of `unsigned`. So it's not 100% portable, but the problem isn't accessing a signed int in general, it's accessing `-1` in particular. – Steve Jessop Nov 09 '11 at 13:15
  • @steve I have the impression that `char` type will not have such problems, since C guarantees there will be no padding bits. There is no endianess issues. C guarantees there is a 1-1 mapping of values bits. So the only question left is whether the sign bit maps to a value bit of unsigned. Given that signed and unsigned char must have the same width, that has to be the case. (having said that, I agree your answer is better) – tyty Nov 09 '11 at 13:33
  • I disagree that "signed and unsigned char must have the same width". They use the same storage, but that doesn't mean they have the same number of padding bits. Unless I've missed somewhere in the standard that says `signed char` has no padding, of course. – Steve Jessop Nov 09 '11 at 14:20
  • @steve you are right about the part that signed char can have padding bits. See http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1310.htm. There are plans to remove it http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1375.pdf – tyty Nov 09 '11 at 14:56
  • @tyty: well found, and that change is in C1X. I checked n1548, which I believe is approved and just awaiting typesetting, legal boilerplate, and formal publication to become C11 (or C12, don't know how long it'll take). So your code is guaranteed to work by C1X, and in fact works for all implementations known to the committee. – Steve Jessop Nov 09 '11 at 15:01

1 Answers1

5

I think you're better off just masking the bits of a negative int:

if ((-1 & 0x1) == 0) {
    // -1 ends in "0" => 1s' complement
} else if ((-1 & 0x2) == 0) {
    // -1 ends in "01" => sign-magnitude
} else {
    // -1 ends in "11" => two's complement
}

Strictly speaking, this doesn't tell you the same thing as your code, since there's no guarantee that int and signed char use the same meaning of the sign bit. But (a) seriously? and (b) this works for types int and larger, for smaller types it's trickier. unsigned char is guaranteed to have no padding bits, but signed char is not. So I think it's legal to have (for example) CHAR_BIT == 9, UCHAR_MAX = 511, CHAR_MAX = 127, and signed char has 1 padding bit. Then your code could fail: the sign bit in the stored signed value isn't necessarily where you expect it to be, and the value of the padding bit could be either 0 or 1.

In a lot of cases you could just use int8_t in the program instead of signed char. It's guaranteed to be 2's complement if it exists, so might save you from caring about the representation of signed char. If it doesn't exist, the program won't compile, which is kind of what you're asserting anyway. You'd get a false negative from platforms which are 2's complement, but don't have an 8-bit char and therefore do not provide int8_t. This may or may not bother you...

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
  • To clarify, are you implying that the negative number representation can be 2's complement for signed char, another system (e.g. 1's complement) for short, and another for int, etc... Also I do not understand the part about int and signed char using same meaning of the sign bit. – tyty Nov 09 '11 at 13:52
  • @tyty: yes, I'm implying that `signed char` could be 2's complement while `short` is 1s' complement. The standard in 6.2.6.2/2 says that the sign bit modifies the value in "one of the following ways", then lists three ways that the sign bit can affect the value, corresponding to the three permitted representations. So "meaning of sign bit" is the same as "representation". It doesn't explicitly say whether the implementation must choose the same "way" for each integer type, so I think there's no such requirement. Hence it's bizarre but legal to make different choices. – Steve Jessop Nov 09 '11 at 14:03
  • Formally speaking, I think the English in the standard is ambiguous: "For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit ... If the sign bit is one, the value shall be modified in one of the following ways". It's ambiguous whether this means, "there exists one of the following ways such that for all types, the sign bit modifies the value in that way" vs "for all types, there exists one of the following ways such that the sign bit modifies the value in that way". – Steve Jessop Nov 09 '11 at 14:09
  • So, as an implementer I'd assume the worst possible meaning for implementers, and make them all the same. As a writer of programs, and assuming we even care about such strange architectures, I would assume the worst possible meaning for programs, that they might be different. It's possible that there's an authoritative commentary on the standard somewhere (like a defect report) that resolves the ambiguity, it's also possible that I'm the only person in the world that sees the ambiguity, and that to everyone else it's obvious what it means ;-) – Steve Jessop Nov 09 '11 at 14:12
  • Note that, if `uint8_t` exists, either `unsigned char` must be twos complement, or an extended integer type must be used to define it. Either way, `CHAR_BIT` must be 8. – R.. GitHub STOP HELPING ICE Nov 09 '11 at 14:20
  • @steve Note this wording **unspecified behavior where each implementation documents how the choice is made**. Also see http://www.open-std.org/jtc1/sc22/wg14/www/docs/n868.htm and http://www.open-std.org/jtc1/sc22/wg14/www/docs/n873.htm: search for 6.2.6.2. The wording then was **The implementation shall document which shall apply**. This seems to be clear that they intended only 1 to apply. Furthermore, they are thinking about removing 1's complement and signed mag. So it is unlikely they have in mind a machine which supports all three – tyty Nov 09 '11 at 15:31
  • @R.. Please clarify "... either `unsigned char` must be twos complement ...". How can `unsigned` anything be "twos complement"? – chux - Reinstate Monica Jun 25 '14 at 22:43
  • @chux: Indeed that part of my comment was nonsense. It was supposed to be about `int8_t` and `signed char`, I think. – R.. GitHub STOP HELPING ICE Jun 25 '14 at 23:27