I have stumbled upon a reddit thread in which a user has found an interesting detail of the C++ standard. The thread has not spawned much constructive discussion, therefore I will retell my understanding of the problem here:
- OP wants to reimplement
memcpy
in a standard-compliant way - They attempt to do so by using
reinterpret_cast<char*>(&foo)
, which is an allowed exception to the strict aliasing restrictions, in which reinterpreting aschar
is allowed to access the "object representation" of an object. - [expr.reinterpret.cast] says that doing so results in
static_cast<cv T*>(static_cast<cv void*>(v))
, soreinterpret_cast
in this case is equivalent to static_cast'ing first tovoid *
and then tochar *
. - [expr.static.cast] in combination with [basic.compound]
A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T”, where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. [...] if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b. [...] [emphasis mine]
Consider now the following union class:
union Foo{
char c;
int i;
};
// the OP has used union, but iiuc,
// it can also be a struct for the problem to arise.
OP has thus come to the conclusion that reinterpreting a Foo*
as char*
in this case yields a pointer pointing to the first char member of the union (or its object representation), rather than to the object representation of the union itself, i.e. it points only to the member. While this appears superficially to be the same, and corresponds to the same memory address, the standard seems to differentiate between the "value" of a pointer and its corresponding address, in that on the abstract C++ machine, a pointer belongs to a certain object only. Incrementing it beyond that object (compare with end() of an array) is undefined behavior.
OP thus argues that if the standard forces the char*
to be associated with the objects's first member instead of the object representation of the whole union object, dereferencing it after one incrementation is UB, which allows a compiler to optimize as if it were impossible for the resultant char*
to ever access the following bytes of the int member. This implies that it is not possible to legally access the complete object representation of a class object which is pointer-interconvertible with a char
member.
The same would, if I understand correctly apply if "union" was simply replaced with "struct", but I have taken this example from the original thread.
What do you think? Is this a standard defect? Is it a misinterpretation?