Casting pointer to an integer

Question

It's been mentioned -- in previous questions that I've asked -- that it's not a good practice to convert a pointer to an integer type. What are some examples why this is not a good idea? What about something like the following -- why would that be considered poor practice?

short first_local_int   = 44;
int second_local_int    = 92;

printf(
        "The difference between the two memory addresses (in bytes) is: %lu", 
         (unsigned long) &second_local_int - (unsigned long) &first_local_int
);

The actual difference between the two memory addresses (in bytes) is: 2

The problem is that `unsigned long` may have a different size than the size of a pointer. — Jabberwocky, Oct 31 '20 at 21:48
@DanielWalker 'this' meaning when they are of different sizes (Jabberwocky's comment) or as written in the code in my question? — samuelbrody1249, Oct 31 '20 at 21:50
I'd say it's because you're casting pointers to a completely different type. — Daniel Walker, Oct 31 '20 at 21:50
Also the difference of the addresses of two variables is decided by the compiler. In your cas it is 2, but it might as well be something else. — Jabberwocky, Oct 31 '20 at 21:51
You should use `intptr_t` when interpreting pointers as numbers. https://stackoverflow.com/questions/6326338/why-when-to-use-intptr-t-for-type-casting-in-c — Timbo, Oct 31 '20 at 21:51
On my platform `unsigned long` is 32 bits, but the size of a pointer is 64 bits. — Jabberwocky, Oct 31 '20 at 21:52
C permits pointers to be converted to integers (by cast). But it explicitly disclaims defining any specific significance for the resulting integer values. — John Bollinger, Oct 31 '20 at 21:54
@JohnBollinger so instead of doing `(unsigned long)` should I specify a one-byte type, `(char *)` to get the diff in bytes? — samuelbrody1249, Oct 31 '20 at 22:01
Arithmetic on pointers that do not point to the same object (or one element past) is also *undefined behaviour*. — Weather Vane, Oct 31 '20 at 22:03

the busybee · Accepted Answer · 2020-10-31T22:13:17.823

The standard C11 (as an example that I have at hand) says in chapter 6.3.2.3 "Pointers" in paragraph 5:

An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.

The mentioned exception is about the value 0, which yields a null pointer.

Paragraph 6 is on the other way:

Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.

Any time I see "implementation-defined" or "undefined behavior", the code will be generally not portable. If you prefer to write good code, refrain from using such constructs. However, if you know what you are doing, and if you test your expectations, you might get away with it.

BTW, the difference of two pointers not pointing into the same array (or exactly past the end of it) is undefined behavior, too.

EDIT:

Chapter 7.20.1.4 "Integer types capable of holding object pointers" of the same standard says:

The following type designates a signed integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:

intptr_t

The following type designates an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:

uintptr_t

These types are optional.

The last sentence is important.

score 1 · Answer 2 · answered Oct 31 '20 at 22:14

What are some examples why this is not a good idea?

It is not a good idea because although pointer to integer conversions are allowed, the significance of their results is for the most part not specified by the standard. Specifically,

Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.

[C2018, paragraph 6.3.2.3/6]

That's a pretty weak provision to rely upon for any useful behavior. In practice, most programs that get useful behavior out of pointer-to-integer conversions do so by leveraging the appropriate definition of that behavior provided by their C implementations, which is a portability limitation.

What about something like the following -- why would that be considered poor practice?

That would be a matter of opinion.

However, although the code fragment conforms -- even strictly conforms -- to the standard, to the extent that that can be evaluated for such an isolated fragment, the message it prints is not necessarily correct about the relationship between the addresses involved. Indeed, the C model of the world does not even support the concept of relationships between the addresses of unrelated objects, except (non-)equality relationships.

Joshua · Answer 3 · 2020-10-31T22:19:43.103

1

Behold the XOR linked list: https://en.wikipedia.org/wiki/XOR_linked_list The general idea of the XOR linked list is we can store two pointers in the same memory address, but you need a pair of pointers in the list traversal algorithm. It does have the upside of the exact same code traverses the list in either direction.

The greatest downside is your code is harder to understand than it need be.

The second greatest downside is the debugger can't handle it.

If that's not enough downside, I posit the following downside: if you mess it up (and it's easier to mess it up than most other things), your code becomes undefined, sometimes in a way you won't notice for awhile. This happended so much in Windows software, that the default address space for a 64 bit executable is still 2GB. (They changed the SDK comparatively recently to set the LARGEADDRESSAWARE flag but the binary image default is still no.)

edited Oct 31 '20 at 22:19

answered Oct 31 '20 at 22:16

Joshua

40,822
8
72
132

that's pretty neat, thanks! Care to show a very basic example of how that would be used in C? – samuelbrody1249 Oct 31 '20 at 22:18
@samuelbrody1249: Heck no. – Joshua Oct 31 '20 at 22:20
What a funny idea, the XOR linked list. I learn something new every day. :-D However, after reading the article, I'd prefer the DIFF linked list (subtract instead of XOR). – the busybee Nov 01 '20 at 15:39

score 0 · Answer 4 · answered Oct 31 '20 at 22:26

A pointer may be converted to an integer, because C 2018 6.3.2.3 6 says:

Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined…

Furthermore, note 69 says:

The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.

Notes are not normative parts of the standard, but this tells us that for a “regular” C implementation, we should expect converting a pointer to an integer to yield the hardware memory address of the pointed-to thing, if it fits in the integer type. Note that some C implementations are designed for specific purposes, so they may be “irregular.” For example, a C implementation could be designed to be space efficient and use narrower pointers than the hardware supports.

A proper integer type to use for pointer-to-integer conversion is uintptr_t, which is defined in <stdint.h>. This is because 7.20.1.4 1 defines uintptr_t to be capable of holding (all the information of) object pointers:

The following type designates an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:

uintptr_t

In common modern hardware, memory addresses are simple integers in a “flat” address space. Bytes are numbered consecutively from 0 to whatever the maximum is. Each memory location corresponds to one number, and each number in this range corresponds to one memory location. (However, the fact that an address exists to designate a memory location does not mean the memory location is mapped or accessible in a particular process’ address space.)

Older machines had a variety of memory address schemes. Some schemes involved combinations of base address and offsets. In these machines, addresses had two or more parts, such as a 16-bit base b and an 16-bit offset o. When a pointer was converted to an integer, the result would be a 32-bit integer with b in the high bits and o in the low bits, equal to 65536•b + o. However, these integers did not consecutively number the addresses. When base b and offset o was used to access memory, the actual hardware address formed might be 64•b + o.

One effect of this is that b+1, o and b, o+64 are different addresses for the same memory location. Another effect is that subtracting the two integers that resulted from converting pointers would not necessarily give you the distance between them. The distance between b+1, o and b, o is 64 bytes, but subtracting 65536•b + o from 65536•(b+1) + o gives 65536.

thanks, is `uintptr_t` based on the `unsigned long` type or is the size of that based on the architecture/c-implementation? (I ask because for me the size is 8). — samuelbrody1249, Oct 31 '20 at 22:33
@samuelbrody1249: Each C implementation defines it appropriately. One C implementation might use `unsigned long` while another uses `unsigned long long` and another uses a custom type. — Eric Postpischil, Oct 31 '20 at 22:47

HAL9000 · Answer 5 · 2020-11-01T00:05:51.223

Where (unsigned long) &second_local_int - (unsigned long) &first_local_int might not work, is on the old 16 bit 8086 architecture. On Intel 8086, the memory bus was 20 bit wide, and to access memory you had to use two 16 bit registers, a segment register, and an offset register. For instance if DS was your segment register and AX your offset, to calculate the real memory address the cpu would do hwaddr = (DS<<4) + AX

If your program didn't need more than 64k, the segment register stayed fixed, and all your pointers where 16bit. Otherwise you had 32 bit pointers, 16 bit for segment and 16 bit for offset. Turning a pointer into a 32bit integer wouldn't give a hardware address, but keep the bit-values of the pointer.

In practice I don't think that working with "pointers as integers" would be a big problem, since pointer-arithmetic across different segments probably wouldn't work either.

Casting pointer to an integer

5 Answers5