Implementation specific behavior when casting a pointer to a data type

Question

From my understanding, according to C standard, casting an int pointer to int is unadvised for portable code. A simple example would be doing such a cast on a 64-bit architecture where pointers are 64 bits but integer types are 32 bits, in this case a cast would truncate information, ie an actual physical example of how things can go wrong

The same is true for casting an integer to an int pointer. However, I cannot find an example as to why exactly this is considered to be UB/implementation specific. I get that C standard advises against it, but what exactly can go wrong? The only vague example I found was somebody mentioning possible alignment issues, how exactly would those arise?

Lundin · Accepted Answer · 2020-09-11T08:34:16.163

The C standard is fairly detailed in listing possible problems, C17 6.3.2.3/5:

An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.

So the various potential problems are:

Different sizes. This is the most obvious issue. A pointer address might not fit inside an integer, or the other way around.
Alignment. An integer containing some random number might cause a misaligned address when converted to a pointer.
Incorrect addresses, such as misaligned ones or addresses pointing into executable code rather than data, may cause implementation defined "traps"/hardware exceptions.

Far less likely, integers can in theory contain trap representations too, but that's likely only relevant for exotic/fictional one's complement and signed magnitude systems. The C standard allows for such systems, but very few such systems have actually existed in the history of computers.
Wrong type. If we lie to the compiler when converting to/from pointers and tell it that there's another type stored at a location than what's actually stored there, we might get into all manner of problems. We may screw up the compiler's internal track of what types that are stored where, so called "strict pointer aliasing violations". This in turn might cause optimization-related bugs.

What is the strict aliasing rule?

We may also, once again, cause problems with misalignment and traps, or just by the program not making any sense of what's stored at a certain location.
Pointer arithemtic on unknown addresses. There may be issues with casting to a physical address where the compiler doesn't know what's stored (no known type) and then perform pointer arithmetic from there. Because pointer arithmetic is only well-defined when pointing at an array of a known type. Strictly speaking, doing so is undefined behavior, so it might cause some poor compiler implementations to bug out and produce random behaving code. Hosted system compilers are known to do this - it's a quality of implementation problem. In particular, be very afraid of such bugs when using the gcc compiler for embedded systems programming.
Exotic pointer formats. Some systems utilize extended addressing modes, that go beyond the default address bus width. This is very common in low-end embedded systems with 8-/16 bit addresses, but also existed in the PC world back in the MS DOS days. Typically such extended addresses are using a non-standard pointer type (a common non-standard extension is the far keyword). Converting to/from these pointers types and integers will be very system-specific.

The most correct type to use for converting to/from pointer types is uintptr_t. This is defined to be "large enough" and suitable to hold the representation of a pointer.

On some exotic systems we may also use intptr_t which is the signed equivalent. That one only makes sense if the OS has some weird internal virtual addressing, such as placing kernel space at negative addresses.

(And yes I just called Linux addressing weird and exotic, please forward your complaints to my negative phone number or negative postal address, after which I will reimburse you with an credit invoice for a negative amount of dollars.) — Lundin, Sep 11 '20 at 08:24
Fine example of an exotic pointer format I am forced to use in a current project in C++, running on 32-bit x86: 6 bytes, a 16-bit value for a selector and a 32-bit value for the address. — the busybee, Sep 11 '20 at 09:21
@thebusybee In embedded systems it is usually 8 bit for "bank" or "page", then 16 bits for the address in that bank. — Lundin, Sep 11 '20 at 09:26
Seen that, done it. ;-) In my case the x86 runs in 32-bit protected mode, AFAIK it has nothing to do with banking. — the busybee, Sep 11 '20 at 09:32

score 1 · Answer 2 · answered Sep 11 '20 at 07:07

1

Not all platforms have the same size int and int*. This can lead to truncation and alignment problems among others. It can also seemingly work without problems.

For portable behavior, it is advisable to use the fixed-width integers defined by the C99 standard in <stdint.h>.

You would use an uintptr_t as the variable to hold a pointer-address.

See this answer as well: When is casting between pointer types not undefined behavior in C?

answered Sep 11 '20 at 07:07

Morten Jensen

5,818
3
43
55

The only defined operation on a pointer produced by casting a `uintptr_t` is comparing the resulting pointer to the original. It is possible for two pointers to compare equal without being usable to access any of the same objects. If a pointer that could be part of such a pair (a situation that applies to most pointers) were cast to a uintptr_t and back, the Standard would say nothing about whether it could be used to access objects that were accessible by the original, by its counterpart, both, or neither. – supercat Sep 15 '20 at 17:34
@supercat I agree and that is caused by the strict aliasing rules as far as I understand it? Here I only intend to answer the question of why casting `int*` to `int` is problematic and what to cast `int*` to instead of `int`. – Morten Jensen Sep 16 '20 at 08:13
The problems I describe stem largely from compilers making unreasonable assumptions about aliasing, but they're generally not related to the "Strict Aliasing Rule". Although `uintptr_t` is intended to be the type suitable for the described purpose, the Standard doesn't mandate that compilers treat it meaningfully. It relies upon compilers writers to exercise the same sort of common sense that the maintainers of free compilers refuse to apply to the "Strict Aliasing Rule". – supercat Sep 16 '20 at 15:02

Implementation specific behavior when casting a pointer to a data type

2 Answers2