Would making plain int 64-bit break a lot of reasonable code?

Question

Until recently, I'd considered the decision by most systems implementors/vendors to keep plain int 32-bit even on 64-bit machines a sort of expedient wart. With modern C99 fixed-size types (int32_t and uint32_t, etc.) the need for there to be a standard integer type of each size 8, 16, 32, and 64 mostly disappears, and it seems like int could just as well be made 64-bit.

However, the biggest real consequence of the size of plain int in C comes from the fact that C essentially does not have arithmetic on smaller-than-int types. In particular, if int is larger than 32-bit, the result of any arithmetic on uint32_t values has type signed int, which is rather unsettling.

Is this a good reason to keep int permanently fixed at 32-bit on real-world implementations? I'm leaning towards saying yes. It seems to me like there could be a huge class of uses of uint32_t which break when int is larger than 32 bits. Even applying the unary minus or bitwise complement operator becomes dangerous unless you cast back to uint32_t.

Of course the same issues apply to uint16_t and uint8_t on current implementations, but everyone seems to be aware of and used to treating them as "smaller-than-int" types.

Now that is a really good question. I bet it would. In theory code is easily ported from x86 to x64, unless you make size assumptions that aren't guaranteed. I've personally used at least one library that did this that meant I couldn't trivially rebuild it for x64. — , Dec 30 '10 at 01:21
As long as only `long` gets bigger, I agree that porting clean code should be trivial. But it seems like when `int` gets bigger, you have a lot more things that can go wrong.. — R.. GitHub STOP HELPING ICE, Dec 30 '10 at 01:31
Having kept a personal archive of machines from 68k to Alpha around to test portability of my code... I consider the question incomprehensible on the grounds that any code which breaks when you change the size of int is code that I consider to be unreasonable by definition. — wrosecrans, Dec 30 '10 at 02:00

score 8 · Accepted Answer · answered Dec 30 '10 at 08:38

8

As you say, I think that the promotion rules really are the killer. uint32_t would then promote to int and all of a sudden you'd have signed arithmetic where almost everybody expects unsigned.

This would be mostly hidden in places where you do just arithmetic and assign back to an uint32_t. But it could be deadly in places where you do comparison to constants. Whether code that relies on such comparisons without doing an explicit cast is reasonable, I don't know. Casting constants like (uint32_t)1 can become quite tedious. I personally at least always use the suffix U for constants that I want to be unsigned, but this already is not as readable as I would like.

Also have in mind that uint32_t etc are not guaranteed to exist. Not even uint8_t. The enforcement of that is an extension from POSIX. So in that sense C as a language is far from being able to make that move.

answered Dec 30 '10 at 08:38

Jens Gustedt

76,821
6
102
177

Systems where `uint32_t` doesn't exist aren't really a worry, since code that wouldn't work, wouldn't compile. The promotion of `uint32_t` to an unsigned type is a much bigger problem. If `a` is some `uint32_t` value, and `b` is a `uint32_t` value equal to 1024, and `a` equals 1, and c is something unsigned with the value 0xFFFFFF00, the range of `a` values where `(a-b > c)` is true would be 769..1023 on if `int` is 32 bits; if `int` is 64-bits, it will be true never if `c` is smaller than `uint64_t`, or true for 0..1023 if `c` is an unsigned type 64 bits or larger. – supercat Mar 14 '14 at 15:37
1

I would expect a lot of code for things like TCP which use "rolling" 32-bit index numbers will break if compiled on a system where the difference between two `uint32_t` values isn't a `uint32_t`. IMHO, for C to be a portable language, it needs a better way of declaring types based upon required behavior, so the difference between two mod-4294967296 counters would always be computed mod 4294967296 regardless of integer promotion. – supercat Mar 14 '14 at 15:42

score 5 · Answer 2 · edited Mar 30 '11 at 06:02

5

"Reasonable Code"...

Well... the thing about development is, you write and fix it and then it works... and then you stop!

And maybe you've been burned a lot so you stay well within the safe ranges of certain features, and maybe you haven't been burned in that particular way so you don't realize that you're relying on something that could kind-of change.

Or even that you're relying on a bug.

On olden Mac 68000 compilers, int was 16 bit and long was 32. But even then most extant C code assumed an int was 32, so typical code you found on a newsgroup wouldn't work. (Oh, and Mac didn't have printf, but I digress.)

So, what I'm getting at is, yes, if you change anything, then some things will break.

edited Mar 30 '11 at 06:02

Prof. Falken

24,226
19
100
173

answered Dec 30 '10 at 01:33

david van brink

3,604
1
22
17

1

MSVC even refused to change `long` to 64 bits because of all the code that assumed it's 32. – dan04 Dec 30 '10 at 01:56
I've seen many windows programmers assume sizeof(float) == sizeof(long) for example. Good way to make code that breaks hard on other platforms. :) – Maister Dec 30 '10 at 02:13
I see assuming that types have a certain size, and ignoring the possibility of types changing signedness due to default promotions which don't happen under "normal" type sizes, as two very different levels of error. – R.. GitHub STOP HELPING ICE Dec 30 '10 at 03:02
@R: C disallows types to change signedness under promotions. If you encounter it then your compiler is buggy. – slebetman Dec 30 '10 at 05:09
@slebetman: Incorrect. The "integer promotions" are, in part: *"If an `int` can represent all values of the original type, the value is converted to an `int`; otherwise, it is converted to an `unsigned int`."* - this means, for example, that `unsigned char` is typically promoted to `int` (and that, with a hypothetical 64 bit `int`, the same would apply to `uint32_t`). – caf Dec 30 '10 at 05:18
@R: Changing type signdness does not affect value signdness. Promoting an unsigned 32 bit integer to a signed 64 bit integer can never suddenly make that number negative. – slebetman Dec 30 '10 at 05:23
4

@slebetman: The problem is that it **does** change the signedness of the results of arithmetic operations. For instance, `-(uint32_t)1` is negative if and only if `uint32_t` has rank lower than `int`. – R.. GitHub STOP HELPING ICE Dec 31 '10 at 01:48
@R..: One thing I'd like to see added to C would be some new numeric types with more explicitly controllable promotion rules, so that e.g. the result of adding two new-style 16-bit wrapping unsigned integers would be independent of the size of `int`. Using such types would allow much more robust portable code, and could also allow more efficient code generation in cases where out-of-range behavior doesn't matter. – supercat Feb 10 '14 at 06:20
@supercat: While it may have been slightly better to have that sort of behavior from the beginning in C, I'm skeptical of whether adding it now would be an improvement. You can get the same thing by simply casting the result back to the operand type. – R.. GitHub STOP HELPING ICE Feb 10 '14 at 06:28
1

@R..: Having the meaning of expression `a-b > c` vary depending upon the size of `int` is bad. When trying to write robust code, it's very helpful to be able to tell the compiler enough about what one is trying to do that the compiler can squawk if things won't work as intended. The existing types need to be supported for existing code, but that doesn't mean they can't be deprecated in favor of better alternatives. – supercat Feb 10 '14 at 06:42
@R..: What I'd like to see would be a means via which `int(option,option,...)` would declare an integer type with specified characteristics [I don't think that syntax has any other valid meaning]; `typedef` could be used to create arbitrary shorthands. Rather than specifying that unsigned integers wrap and signed ones don't, I would allow the declaration to specify whether a value should be a "number" [not specified to wrap] or a "ring" [specified to wrap]. Numbers would promote to rings, but *not* vice versa [except that rings could be coerced to old-style integer types for interaction... – supercat Feb 10 '14 at 15:58
...with older code]. If a particular platform can load or store 32-bit values faster than 16-bit values, an "expandable 16-bit signed ring" could take either 16 or 32 bits of storage as convenient, but adding 1 to 32767 would have to yield -32768. A "non-expandable unsigned 8-bit number" would have to fit in one byte if its address was taken, but trying to store 256 in it would be Undefined Behavior. A compiler for any platform which can handle everything else in C11 should have no problem generating code to yield precise behaviors; it may not be efficient, but... – supercat Feb 10 '14 at 16:02
1

...if a programmer needs particular behavior, better to let the compiler to it than have the programmer kludge it. – supercat Feb 10 '14 at 16:03

score 3 · Answer 3 · answered Dec 30 '10 at 01:54

With modern C99 fixed-size types (int32_t and uint32_t, etc.) the need for there to be a standard integer type of each size 8, 16, 32, and 64 mostly disappears,

C99 has fixed-sized typeDEFs, not fixed-size types. The native C integer types are still char, short, int, long, and long long. They are still relevant.

The problem with ILP64 is that it has a great mismatch between C types and C99 typedefs.

int8_t = char
int16_t = short
int32_t = nonstandard type
int64_t = int, long, or long long

From 64-Bit Programming Models: Why LP64?:

Unfortunately, the ILP64 model does not provide a natural way to describe 32-bit data types, and must resort to non-portable constructs such as __int32 to describe such types. This is likely to cause practical problems in producing code which can run on both 32 and 64 bit platforms without #ifdef constructions. It has been possible to port large quantities of code to LP64 models without the need to make such changes, while maintaining the investment made in data sets, even in cases where the typing information was not made externally visible by the application.

As I hinted at in the first paragraph of the question, this consideration is mostly obsolete, since anyone wanting/needing fixed-size types should be using `stdint.h`. — R.. GitHub STOP HELPING ICE, Dec 30 '10 at 01:57
I don't think the lack of a standard type name for a 32-bit type is as problematic as the fact that an ILP64 implementation would presently be forbidden from providing *any* type which behaves the way a uint32_t does in expressions like `seq2-seq1 > buffsize`. — supercat, Jul 28 '16 at 16:20

Jonathan Leffler · Answer 4 · 2010-12-31T23:42:50.660

3

DEC Alpha and OSF/1 Unix was one of the first 64-bit versions of Unix, and it used 64-bit integers - an ILP64 architecture (meaning int, long and pointers were all 64-bit quantities). It caused lots of problems.

One issue I've not seen mentioned - which is why I'm answering at all after so long - is that if you have a 64-bit int, what size do you use for short? Both 16 bits (the classical, change nothing approach) and 32 bits (the radical 'well, a short should be half as long as an int' approach) will present some problems.

With the C99 <stdint.h> and <inttypes.h> headers, you can code to fixed size integers - if you choose to ignore machines with 36-bit or 60-bit integers (which is at least quasi-legitimate). However, most code is not written using those types, and there are typically deep-seated and largely hidden (but fundamentally flawed) assumptions in the code that will be upset if the model departs from the existing variations.

Notice Microsoft's ultra-conservative LLP64 model for 64-bit Windows. That was chosen because too much old code would break if the 32-bit model was changed. However, code that had been ported to ILP64 or LP64 architectures was not immediately portable to LLP64 because of the differences. Conspiracy theorists would probably say it was deliberately chosen to make it more difficult for code written for 64-bit Unix to be ported to 64-bit Windows. In practice, I doubt whether that was more than a happy (for Microsoft) side-effect; the 32-bit Windows code had to be revised a lot to make use of the LP64 model too.

edited Dec 31 '10 at 23:42

answered Dec 31 '10 at 22:38

Jonathan Leffler

730,956
141
904
1,278

On platforms which support 8, 16, and 32 bit arithmetic it is useful to have 8, 16, and 32-bit types. No machine which supports those sizes should have any problem making "char" be 8, "short" will be 16, and "long" will be 32. If pointers grow beyond 32 bits, it's not possible to avoid breaking either code that expects `long` is big enough to hold a pointer, or that `long` is 32 bits. I think the latter kind of code is more common, and am confused as to why people dislike the notion of saying either `int64_t` or `long long` when they want a 64-bit value. – supercat Oct 27 '16 at 18:40
@supercat: It's about 20 years too late for that discussion. The LP64 model is essentially universal on POSIX systems (`long` is 64 bits); the LLP64 model is used by Windows (`long` is 32 bits); the two models disagree and you have to write code rather carefully if it is to run unchanged on the two models. Typically, that means you do not write code using the simple (raw) types like `int` and `long`; you write the code using some other system of names. There are also interfaces where `int` is defined as a type ([`fgets()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/fgets.html)). – Jonathan Leffler Oct 27 '16 at 18:50
I'm well aware that Unix world has gone to LP64; my point is that it is not unreasonable for Microsoft to think that even on LP64, since any program that wants 64-bit values can use `long long` or `int64_t`, there should be no particular imperative for a compiler to assume that programmers who use `long` rather than `long long` want a 64-bit type rather than a 32-bit one. Basically, viewing the situation as analogous to days where some platforms used 16-bit "int" and some used 32-bit "int", and programmers who cared whether something was 16 or 32 bits would use "short" or "long", respectively. – supercat Oct 27 '16 at 19:09
If programmers who want 64-bit types always use `long long` or `int64_t`, then the only code which would be affected by the choice of type for `long` would be code which had expected it to be a 32-bit type, so there would be no disadvantage to keeping that meaning. Unless a compiler is so silly as to not recognize aliasing between a 64-bit `long` and a 64-bit `long long`, I don't see what advantage there would be for a programmer who wants 64 bits using a `long`. What advantage is there? – supercat Oct 27 '16 at 19:19

Michael Burr · Answer 5 · 2012-02-14T07:48:00.327

There's one code idiom that would break if ints were 64-bits, and I see it often enough that I think it could be called reasonable:

checking if a value is negative by testing if ((val & 0x80000000) != 0)

This is commonly found in checking error codes. Many error code standards (like Window's HRESULT) uses bit 31 to represent an error. And code will sometimes check for that error either by testing bit 31 or sometimes by checking if the error is a negative number.

Microsoft's macros for testing HRESULT use both methods - and I'm sure there's a ton of code out there that does similar without using the SDK macros. If MS had moved to ILP64, this would be one area that caused porting headaches that are completely avoided with the LLP64 model (or the LP64 model).

Note: if you're not familiar with terms like "ILP64", please see the mini-glossary at the end of the answer.

I'm pretty sure there's a lot of code (not necessarily Windows-oriented) out there that uses plain-old-int to hold error codes, assuming that those ints are 32-bits in size. And I bet there's a lot of code with that error status scheme that also uses both kinds of checks (< 0 and bit 31 being set) and which would break if moved to an ILP64 platform. These checks could be made to continue to work correctly either way if the error codes were carefully constructed so that sign-extension took place, but again, many such systems I've seen construct the error values by or-ing together a bunch a bitfields.

Anyway, I don't think this is an unsolvable problem by any means, but I do think it's a fairly common coding practice that would cause a lot of code to require fixing up if moved to an ILP64 platform.

Note that I also don't think this was one of the foremost reasons for Microsoft to choose the LLP64 model (I think that decision was largely driven by binary data compatibility between 32-bit and 64-bit processes, as mentioned in MSDN and on Raymond Chen's blog).

Mini-Glossary for the 64-bit Platform Programming Model terminology:

ILP64: int, long, pointers are 64-bits
LP64: long and pointers are 64-bits, int is 32-bits (used by many (most?) Unix platforms)
LLP64: long long and pointers are 64-bits, int and long remain 32-bits (used on Win64)

For more information on 64-bit programming models, see "64-bit Programming Models: Why LP64?"

Performing a bitwise and to test for negative values is idiocy only Microsoft could have invented. With a poorly optimizing compiler, it will likely generate larger, slower code than letting the compiler choose the way to test sign, and of course it's less likely to be correct, as you say. I'm really not looking for answers about how idiotic, clearly-nonportable code like this could break, but rather subtle ways that seemingly correct code might break. — R.. GitHub STOP HELPING ICE, Dec 31 '10 at 15:44
I was afraid this might turn into MS bashing - that's not how I intended it (in spite of using the Win SDK as an example). Even though I don't have a study to back me, I'm sure that this is a very common pattern (or anti-pattern) in a lot of code in the wild - Microsoft-oriented or not. And it's not so much a problem for the SDK's handling of the error codes (since those tests could be fixed once in the SDK), but for code that performs these tests manually, not using SDK macros/functions, of which I'm sure there's a lot and that MS would really prefer not to break without good cause. — Michael Burr, Dec 31 '10 at 23:01
@R..: Testing bit 31 is hardly elegant, but I don't think C provides any particularly elegant perfectly-portable alternative. — supercat, Feb 10 '14 at 06:30
@AnttiHaapala: If the values are going to be displayed in hex (as is typical), writing them as 0x80001234 and coercing them to a 32-bit signed integer type will allow the source-code representation to match the output format. If migrated to a platform where `int` is 64 bits, code that tests for bit 31 would recognize both 0x80001234 and -0x7FFFEDCC as error codes; code that tests for <0 wouldn't. — supercat, Oct 09 '20 at 17:02

score 1 · Answer 6 · answered Dec 30 '10 at 02:01

1

While I don't personally write code like this, I'll bet that it's out there in more than one place... and of course it'll break if you change the size of int.

int i, x = getInput();
for (i = 0; i < 32; i++)
{
    if (x & (1 << i))
    {
        //Do something
    }
}

answered Dec 30 '10 at 02:01

user541686

205,094
128
528
886

It won't break if you make the size of int bigger than 32 bits. It will just break on most 8/16 bit platforms. – slebetman Dec 30 '10 at 05:11
Well, it won't "break" in the sense that it'll still compile and run, but it probably isn't what the programmer intended (since he intended to go through all the bits, not just the first 32). – user541686 Dec 30 '10 at 05:49
He *maybe* intended that. :-) – Prof. Falken Mar 30 '11 at 06:06

Johan Kotlinski · Answer 7 · 2010-12-30T01:44:53.257

0

Well, it's not like this story is all new. With "most computers" I assume you mean desktop computers. There already has been a transition from 16-bit to 32-bit int. Is there anything at all that says the same progression won't happen this time?

edited Dec 30 '10 at 01:44

answered Dec 30 '10 at 01:32

Johan Kotlinski

25,185
9
78
101

2

"Is there anything at all that says the same progression won't happen this time?" The fact that there isn't a type between `short` and `int` to keep the old size of `int`. Promoting `int` from 16 to 32 bits was okay, because `short` was still 16. If `int` goes to 64 bits, what will be 32 bits? If you answer `short`, what will be 16 bits? `short short`? That's just silly. The standard added `long long` to the other end of the spectrum already to keep things from changing too much. – Chris Lutz Dec 30 '10 at 01:48
1

The standard also added int32_t, which seems reasonable for specifying that you want a 32-bit integer. It is not as convenient, but it's portable. – Johan Kotlinski Dec 30 '10 at 01:54
1

@Chris: That's the typical explanation, but it's pretty irrelevant these days since an implementation could just `typedef __int32_t int32_t;` where `__int32_t` is an implementation-defined extended integer type. The default promotions issue is a much more serious problem, I believe. – R.. GitHub STOP HELPING ICE Dec 30 '10 at 01:55
1

@kotlinski - `intN_t` are optional, except 8, 16, 32 and 64 bits, which are not optional _if the system supports those types_, in which case `intN_t` is a typedef for the type. It's perfectly reasonable to say "No basic integral type is 32 bits? Then no `int32_t` for you!" [note: I'm basing this on Wikipedia, but am checking the standard right after I post this.] – Chris Lutz Dec 30 '10 at 01:55
@R.. - It could, but that's always a hideous solution IMHO. And yes, default promotions is more serious. – Chris Lutz Dec 30 '10 at 01:56
C++ makes it even more hideous. How does adding a new nonstandard integer type affect function overloading? – dan04 Dec 30 '10 at 02:12
@R..: Can you think of any situations where code would break if compilers evaluated comparison operators in arithmetically-correct fashion, but the code wouldn't be broken by a larger-than-expected default `int` type? – supercat Feb 13 '14 at 03:22
@supercat: If nothing else, when the operands are `intmax_t` and `uintmax_t`... :-) – R.. GitHub STOP HELPING ICE Feb 13 '14 at 03:38
@R..: Heh--I should have added "...if one uses specific-sized types for everything". I understand reasons for having unsigned types behave as algebraic rings, but the rules only allow predictable behavior when fixed-sized types interact with fixed-type types and the C-style types (`int`, etc.) interact with other such types. Is there any benefit to rules that would forbid compilers from implementing `int64_t a; uint32_t b,c; ...; a+=(b-c);` in machine-independent fashion? – supercat Feb 13 '14 at 03:56

score -1 · Answer 8 · answered Dec 30 '10 at 01:43

-1

Not particularly. int is 64 bit on some 64 bit architectures (not x64).

The standard does not actually guarantee you get 32 bit integers, just that (u)int32_t can hold one.

Now if you are depending on int is the same size as ptrdiff_t you may be broken.

Remember, C does not guarantee that the machine even is a binary machine.

answered Dec 30 '10 at 01:43

Joshua

40,822
8
72
132

3

(u)int32_t *is* guaranteed to be exactly 32-bit. You may be thinking of (u)int_least32_t. – dan04 Dec 30 '10 at 01:45
@dan: No, he's thinking of `int`, not `int32_t`. The OP is not asking about `int32_t` but `int`. – slebetman Dec 30 '10 at 01:50
But `int` is only guaranteed to have 16 bits (or, to be pedantic, a range including ±32767). – dan04 Dec 30 '10 at 01:58
A little bit of history on what Joshua is saying. On PowerPC Macs, int was/is 64 bits on 64 bit machines for programs compiled in 64 bit mode. Nothing broke much. Whatever did break wasn't such a big deal to fix. I would personally say that code making such assumptions on size of int is unreasonable code so strictly speaking "reasonable" code should not break. – slebetman Dec 30 '10 at 01:58
@dan: Joshua did not say otherwise. Read the OP's post and Joshua's post again carefully. – slebetman Dec 30 '10 at 02:00
Actually I did say otherwise. What do you think the limits of uint32_t are on a decimal machine? Anyway, you cannot know the difference without invoking the undefined behavior known as integer overflow so don't worry about it. – Joshua Dec 30 '10 at 03:13
1

C does not support "decimal machines". The underlying representation is specified to always be binary bits. – R.. GitHub STOP HELPING ICE Dec 31 '10 at 01:50
3

@Joshua: `int32_t` is guaranteed to be 32-bits (exactly 32 bits - no padding bits). If the platform cannot support that, `` must not have a typedef for it. – Michael Burr Dec 31 '10 at 03:45

Would making plain int 64-bit break a lot of reasonable code?

8 Answers8

Linked