3

Given an integer type IntT such that sizeof(IntT) == sizeof(void*), and a variable of said type i, is it guaranteed that reinterpret_cast<IntT>(reinterpret_cast<void*>(i)) == i? This is similar to this question, but that question was looking at any arbitrary sized integer so the answer was a straight forward no. Limiting it to integers of exactly the same size as a pointer makes it more interesting.

It strikes me as though the answer would have to be "yes," because the specification states that there exists a mapping to any integer large enough to hold the pointer value. If the variables are the same size, then that mapping must be bijective. If it's bijective, then that also means the conversion from int to void* must also be bijective.

But is there a hole in that logic? Is there a wiggle word in the spec that I'm not accounting for?

Cort Ammon
  • 10,221
  • 31
  • 45
  • 3
    [`std::intptr_t`/`std::uintptr_t`](https://en.cppreference.com/w/cpp/types/integer) (if exist) can hold pointer. – Jarod42 Oct 11 '18 at 23:59
  • The critical issue is that not every string of bits makes a valid `void*`, so not every integer need be touched by the conversion in that direction. – Davis Herring Oct 12 '18 at 00:01
  • I don't think it it guarantied that there is `2 ** (sizeof(void*))` valid (`void*`) pointers. – Jarod42 Oct 12 '18 at 00:03
  • 1
    @Jarod42: Missing a `CHAR_BIT` in that expression, but you have the right idea – Ben Voigt Oct 12 '18 at 00:18
  • @Jarod42: I've be surprised if there weren't `2 ** (sizeof(void*))` pointers, since many programs need more than 16 or 256 valid pointer values. :-) (`2 ** (CHAR_BIT * sizeof(void*))` would be closer to a reasonable limit) – ShadowRanger Oct 12 '18 at 00:20
  • 1
    Too late to fix *"typo"* in my comments :-( (indeed I meant `2 ** (CHAR_BIT * sizeof(void*))`). – Jarod42 Oct 12 '18 at 00:26

1 Answers1

3

I don't think this is guaranteed. The Standard guarantees that a pointer converted to a suitably large integer and back will have its original value. From this follows that there is a mapping from pointers to a subset of the suitably large integers and back. What it does not imply is that for every suitably-large integer value, there is a corresponding pointer value…

As pointed out by DavisHerring in the comments below, this means that the mapping is injective, but does not have to be surjective and, thus, bijective. I believe what the standard implies in mathematical terms would be that there is a left-unique and left-total relation between pointers and integers, not a bijective function.

Just imagine some weird architecture where, for some reason, every third Bit of an address must be zero. Or a slightly more reasonable architecture that uses only the lower 42 Bits of a 64-Bit value to store an address. Independently of how much sense that'd make, the compiler would be free to assume that an integer value being cast to a pointer must follow the pattern of a valid address and, e.g., mask out every third bit or only use the lower six Byte respectively…

Michael Kenzel
  • 15,508
  • 2
  • 30
  • 39
  • That's my reading as well. They really seem to go out of their way to call it out as implementation-defined without making any mention of the size of the integer. – zzxyz Oct 11 '18 at 23:42
  • @DavisHerring You are right. I had a funny feeling about calling it bijective, but somehow didn't go and question my usage of the term. Thanks for pointing that out, I'll update my answer to fix that… – Michael Kenzel Oct 11 '18 at 23:52
  • The standard doesn't even require the conversion to be injective. All it requires is that converting from pointer to suitable integer and back will produce a pointer that compares equal to the original. This is called out on cppreference: "[the same pointer may have multiple integer representations](https://en.cppreference.com/w/cpp/language/reinterpret_cast)." – Raymond Chen Oct 12 '18 at 02:12
  • 1
    @RaymondChen Well, but if it wasn't injective, that would mean that multiple different pointers have the same integer representation, in which case there'd be no way to convert back from integer to pointer!? At least to my understanding, the same pointer having multiple integer representations would just mean that the relation is not right-unique (functional), which is why I tried to point out that it is not a function in my answer above. It would still have to be left-unique (injective) for there to be a conversion back… – Michael Kenzel Oct 12 '18 at 08:42
  • @RaymondChen: The multiple representations are *another* reason it’s not reversible—in particular, the round trip is not guaranteed *even if* the integer was derived from a pointer! – Davis Herring Oct 12 '18 at 16:30
  • @MichaelKenzel Ah, I missed a detail in the standard. The return conversion must produce 'the original value', rather than merely 'compare equal to the original value'. I was thinking about systems where there can be multiple representations for the same underlying pointer, and the round trip may produce an equivalent (but not identical) pointer. (For example, consider a valgrind-like system with "fat pointers" that add metadata to the raw pointer like 'The maximum you can increment this pointer to before you go off the end of the array".) – Raymond Chen Oct 12 '18 at 18:13
  • @RaymondChen The "original" value doesn't imply an object with the same bitwise representation. It means pointing to the same object. – curiousguy Dec 01 '18 at 10:21
  • @curious I'm not so sure. If they meant "pointing to the same object" then why not write that? "the original value" sounds like the value must be identical. – Raymond Chen Dec 01 '18 at 15:06
  • @RaymondChen Because "pointing to the same object" is wrong as a specification! It could be null. It could be past the end. The correct description would be "_pointing to the same object the original was pointing to, or one past the end if the original was, or null..._". Same value is much more concise and avoids the inherent risk when writing an enumeration of cases in a spec (you might forget one). Also "same value" expresses intent better. And unless you show me that bit pattern is necessarily preserve by copying a trivial type... – curiousguy Dec 02 '18 at 03:52
  • @RaymondChen Which trivial type (other than those used to access the representation of an arbitrary object by aliasing) ever guaranteed to have a unique bitwise representation for a given value? None AFAIK. Not should it matter: the result `memcmp` is not expected to mean anything in general. – curiousguy Dec 02 '18 at 04:02
  • @RaymondChen "_sounds like the value must be identical_" It MEANS exactly that. The value is the same, it compares equal to the original, it can be used the same. It can be dereferenced if the original can (and then refers to the same object). Also because an integer is inherently anonymous and not magic, the converted value should be able to refer to any other object that resides at the same address! – curiousguy Dec 02 '18 at 04:15
  • @curious I can't tell whether you're agreeing with me or not. – Raymond Chen Dec 02 '18 at 04:40
  • @RaymondChen I agree that identical value implies either equally null or pointing at the same object or past the end of same array. So if not null, it implies pointing to the same byte. "_The return conversion must produce 'the original value', rather than merely 'compare equal to the original value'._" Yes, it's stronger: anything valid on the original value must be also valid for the converted value. But I also believe that the converted value is more valid than the original, as it can be used to refer to any object that resides NOW at that address even if the original was invalidated. – curiousguy Dec 02 '18 at 06:07
  • (...) for example if an original pointer refers to an object that doesn't exist but another one exists NOW at the same address, converting would revalidate the pointer IMO. – curiousguy Dec 02 '18 at 06:08