2

I want to write signed integer values into a file in a platform independent way.

If they were unsigned, I would just convert them from host byte order to LE (or BE) with the endian(3) family of functions.

I'm not sure how to deal with signed integers though. If I cast them to unsigned values, I loose the sign, since the C standard does not guarantee that

(int) ((unsigned) -1)) == -1

The other option would be to I cast a pointer to the value (i.e., reinterpret the byte sequence as unsigned), but it I'm not convinced that converting endianness after that is going to give anything sensible.

What is the proper way for platform independent signed integer storage?

Update:

  • I know that in practice, almost all architectures use two-complement representation, so that I can losslessly convert between signed and unsigned integers. However, this is question is meant to be more theoretical.

  • Just rolling out my own integer representation (be that storing the decimal letters as ascii characters, or separately storing the sign bit) is of course a solution. However, I'm interested if there is a way that works without completely abandoning the native binary representation.

Nikratio
  • 2,338
  • 2
  • 29
  • 43

6 Answers6

4

The simplest solution:

For writing, just convert to unsigned and use your unsigned endian conversion functions.

For reading the values back, first read them into an unsigned variable, and check if the high bit is set, and do some arithmetic to make the conversion well-defined:

uint32_t temp;
int32_t dest;
if (temp > INT32_MAX) dest = -(int32_t)(-temp-1)-1;
else dest = temp;

As an added bonus, a good compiler on a sane system (i.e. a twos-complement system where the implementation-defined conversion to unsigned is "correct") will first optimize -(int32_t)(-temp-1)-1 to (int32_t)temp, then optimize the two branches of the conditional, which now both contain identical code, to a single code path with no branch.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • (You need to use `-(int32_t)(-temp-1)-1` to be always well-defined.) – caf Oct 26 '11 at 02:36
  • Could you explain how the `-(int32_t)(-temp-1)-1` trick works exactly? I know that you're guaranteed that `temp == dest % 2^n`, but I have trouble figuring out what's going on there. – Nikratio Oct 26 '11 at 13:36
  • If `temp` is a "negative" unsigned value, `-temp-1` (which is equal to `~temp`) is a value which fits (and thus is positive) in both `uint32_t` and `int32_t`. Thus the conversion is well-defined and value preserving. After the conversion, we simply "undo" the operations to get back the original negative value. Note that `int32_t` is always twos-complement, so `-0x7fffffff-1` does not overflow. – R.. GitHub STOP HELPING ICE Oct 26 '11 at 14:00
  • 1
    7.18.1.1: "The typedef name intN_t designates a signed integer type with width N , no padding bits, and a two's complement representation. Thus, int8_t denotes a signed integer type with a width of exactly 8 bits." – R.. GitHub STOP HELPING ICE Oct 26 '11 at 20:03
1

A platform-independent way? If you truly want this, you should consider writing it as text rather than binary (and taking into account that even that is not fully platform-independent since you may want to move it from an ASCII to an EBCDIC platform).

It all depends on how platform-independent you need it to be. C allows for three different signed encodings: two's complement, one's complement and sign/magnitude. But, by far, most machines will use the first one.

Work out first what you actually mean by that term. If you mean you only want to handle two's complement, then casting it to an unsigned is fine.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • I meant independent of endianness and independent of signed integer representation. But now I realize that if I want to be independent of the representation I obviously have to define my own representation, so my question really didn't make sense in the first place ("how can I be representation independent without requiring a specific representation"). – Nikratio Oct 26 '11 at 16:29
1

Use the same approach as when sending data over the network. Convert your unsigned or signed values to big-endian and save them by using htonl(). When reading, convert the data back to your machine endianness by using ntohl().

But as always you need to know if the data originally was signed or unsigned. With just a bit sequence, you can't know for sure.

Milan
  • 15,389
  • 20
  • 57
  • 65
  • This does not work, the result of converting an unsigned integer to a signed integer has an undefined result if the (positive) unsigned value can't be represented by the signed type. – Nikratio Oct 26 '11 at 13:27
0

Options:

  • Store numbers as plain text using printf()-like functions for conversion
  • Convert negative numbers to sign + absolute value, store them as unsigned with the extra sign bit
Alexey Frunze
  • 61,140
  • 12
  • 83
  • 180
0

Output a 1 byte sign flag (e.g. 0=positive, 1=negative). If the value is negative make it positive and then write the value in big endian format. If you don't like 0 and 1 you could use '+' and '-'.

Jim Rhodes
  • 5,021
  • 4
  • 25
  • 38
0

Store the sign and the absolute value as 2 fields, and recombine them when you read it back.

You said you already know how to convert to/from a well-defined byte order, so all that is left is to determine the sign (hint < 0 might help here :-)), take the absolute value (which you could do in combination with determining what it is, or using abs() or similar.

Something like:

if (num < 0) {
  negative = 1;
   num      = -num;
 } else {
   negative = 0
 }
write_value = htole32(num);
write(file, &negative, 1);
write(file, &write_value, 4);

As an optimization you could collect the sign bits for values together and store them in a single word before the absolute values.

Peter
  • 971
  • 8
  • 15
  • The idea works, but your example code is very broken. htole32 expects an unsigned int, but either you're passing it a signed int, or your `num < 0` test will always fail. Also, your writes are assuming that `negative` has length 1, and `num` length 4, or they will fail on big endian machines. – Nikratio Oct 26 '11 at 13:32