27

Let's say you have an object of type T and a suitably-aligned memory buffer alignas(T) unsigned char[sizeof(T)]. If you use std::memcpy to copy from the object of type T to the unsigned char array, is that considered copy construction or copy-assignment?

If a type is trivially-copyable but not standard-layout, it is conceivable that a class such as this:

struct Meow
{
    int x;
protected: // different access-specifier means not standard-layout
    int y;
};

could be implemented like this, because the compiler isn't forced into using standard-layout:

struct Meow_internal
{
private:
    ptrdiff_t x_offset;
    ptrdiff_t y_offset;
    unsigned char buffer[sizeof(int) * 2 + ANY_CONSTANT];
};

The compiler could store x and y of Meow within buffer at any portion of buffer, possibly even at a random offset within buffer, so long as they are aligned properly and do not overlap. The offset of x and y could even vary randomly with each construction if the compiler wishes. (x could go after y if the compiler wishes because the Standard only requires members of the same access-specifier to go in order, and x and y have different access-specifiers.)

This would meet the requirements of being trivially-copyable; a memcpy would copy the hidden offset fields, so the new copy would work. But some things would not work. For example, holding a pointer to x across a memcpy would break:

Meow a;
a.x = 2;
a.y = 4;
int *px = &a.x;

Meow b;
b.x = 3;
b.y = 9;
std::memcpy(&a, &b, sizeof(a));

++*px; // kaboom

However, is the compiler really allowed to implement a trivially-copyable class in this manner? Dereferencing px should only be undefined behavior if a.x's lifetime has ended. Has it? The relevant portions of the N3797 draft Standard aren't very clear on the subject. This is section [basic.life]/1:

The lifetime of an object is a runtime property of the object. An object is said to have non-trivial initialization if it is of a class or aggregate type and it or one of its members is initialized by a constructor other than a trivial default constructor. [ Note: initialization by a trivial copy/move constructor is non-trivial initialization. — end note ] The lifetime of an object of type T begins when:

  • storage with the proper alignment and size for type T is obtained, and
  • if the object has non-trivial initialization, its initialization is complete.

The lifetime of an object of type T ends when:

  • if T is a class type with a non-trivial destructor ([class.dtor]), the destructor call starts, or
  • the storage which the object occupies is reused or released.

And this is [basic.types]/3:

For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes ([intro.memory]) making up the object can be copied into an array of char or unsigned char. If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value. example omitted

The question then becomes, is a memcpy overwrite of a trivially-copyable class instance "copy construction" or "copy-assignment"? The answer to the question seems to decide whether Meow_internal is a valid way for a compiler to implement trivially-copyable class Meow.

If memcpy is "copy construction", then the answer is that Meow_internal is valid, because copy construction is reusing the memory. If memcpy is "copy-assignment", then the answer is that Meow_internal is not a valid implementation, because assignment does not invalidate pointers to the instantiated members of a class. If memcpy is both, I have no idea what the answer is.

T.C.
  • 133,968
  • 17
  • 288
  • 421
Myria
  • 3,372
  • 1
  • 24
  • 42
  • 18
    If you use `memcpy` then it is not any sort of construction or assignment. – M.M Oct 03 '14 at 01:03
  • Hopefully TC will write an answer, IDK what the status is of objects that are created by using `memcpy` instead of a constructor :) – M.M Oct 03 '14 at 01:07
  • On machines with sizeof(int) = 4 then sizeof(Meow) is usually 8. While sizeof(Meow_internal) is at least 16. No one would use such a compiler because of the extra memory usage. – brian beuning Oct 03 '14 at 01:13
  • @brianbeuning but would it conform to the standard? – M.M Oct 03 '14 at 01:20
  • I don't have access to the Standard right now, but in the several drafts [basic.types]/3 is about **two** objects of type `T`. This seems to fit better to the code example than the quote about character arrays IMHO. – dyp Oct 03 '14 at 01:24
  • 2
    Since you can `memcpy` something that's not a `T` into a `T` - which definitely counts as "reuse" of the storage and ends the lifetime of the `T` object - I see no reason why `memcpy`ing a `T` into a `T` doesn't count as "reuse" as well. And I agree with @brianbeuning that debating the standard compliance of a hypothetical compiler that no sane person would ever write or use is rather pointless. – T.C. Oct 03 '14 at 01:29
  • I think `Meow_internal` violates [basic.life]/7 if your compiler does not change the pointer `px` if we replace the `memcpy` with a `new((void*)&a) Meow(b);`. (Though it might be subtle: `px` is pointing to a non-complete object; one had to conclude from other sources that it must point to an object of the same type afterwards etc. But I think that is the *intention* of the Standard.) – dyp Oct 03 '14 at 01:32
  • 1
    @T.C. The reason that I'm asking this question is that if `Meow_internal` is an illegal implementation, it means that there is no technical basis for the Standard's restriction that `offsetof` require a **standard-layout** structure. It would be possible to formally prove that being **trivially-copyable** would be sufficient to support `offsetof`, and justify the Standard changing its definitions as a result. – Myria Oct 03 '14 at 02:17
  • 1
    @dyp I'm doubtful it breaks that. `px` isn't pointing to an object of type `T`; it's pointing to a subobject, and as far as I can see there's no guarantee that when you reuse the storage of an object pointers to its subobjects remain valid (it does reuse the storage of `*px` as well, of course, but there's no guarantee that this reuse also satisfies the other requirements in [basic.life]/7). – T.C. Oct 03 '14 at 02:22
  • 1
    @dyp The compiler can't adjust the pointers for you behind your back, because you could `reinterpret_cast(px)`, XOR the resulting unsigned integer value with a random number you got from `/dev/urandom`, set `px` to `nullptr`, then do the `memcpy`. After the `memcpy` finishes, use `reinterpret_cast(encrypted_uintptr)` to restore the original pointer value (legal by **[expr.reinterpret.cast]/5**). The compiler has no way to know that you've hidden the pointer. (This would not be a *safely-derived* pointer, though, by **[basic.stc.dynamic.safety]/3**). – Myria Oct 03 '14 at 02:25
  • One way this could be tightened up is to declare that memcpy invalidates all pointers pointing to the overwritten memory, except pointers to the start of the area, if either target or destination contains non standard layout classes. – Lie Ryan Oct 03 '14 at 14:01
  • 1
    @LieRyan If one of the members of a non-standard-layout but trivially-copyable class is a `char` or `unsigned char`, and you retain a pointer to it, it's clear that *some* element of the backing storage array will compare equal to that pointer. So to say that pointers are *invalidated* is incorrect. Perhaps to say that they may be used "in limited ways" as in **[basic.life]/5** is more correct, then? – Myria Oct 03 '14 at 19:57
  • It's interesting to consider how `offsetof`` would interact with this. It is supposed to work with standard-layout classes, so either `offsetof` is unimplementable or your hypothetical compiler violates something else. Note that `offsetof` is a macro that evaluates to the offset of a member *in bytes* given a class name (not an instance), implying that OP's hypothetical complier can't fully implement the standard because `offsetof` would be impossible. http://www.cplusplus.com/reference/cstddef/offsetof/ – Ben Apr 30 '15 at 12:32
  • 1
    @Ben Class `Meow` isn't standard-layout, so `offsetof` would not be required to work with it. However, the point of the exercise above is to point out what seems to me to be something silly in the Standard. The idea is to show that a compliant compiler implementation in which a trivially-copyable (i.e., `memcpy`-compatible) class is not necessarily `offsetof`-compatible is either a contradiction or is so absurd as to never be implemented. Thus, it would be justified to modify the Standard to state that `offsetof` is allowed on trivially-copyable types, not just standard-layout types. – Myria Apr 30 '15 at 19:59
  • 1
    Sure. It's an interesting crazy corner case. My point is that the existence of `offsetof` seems to imply that the offset of a member has to be the same from instance to instance, which breaks your example and makes your hypothetical compiler implicitly non-compliant, I think. Would you agree? – Ben May 01 '15 at 00:33
  • 3
    It is possible that this are is not entirely well-defined in the Standard. Consider [N3751](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3751.pdf) and a [related discussion](http://www.open-std.org/pipermail/ub/2013-September/000127.html) on the UB mailing list. – dyp May 01 '15 at 13:28
  • @dyp: If the Standard were to recognize that any live region of storage that doesn't contain a non-trivial type object contains all trivial-type objects that would fit therein, even though such objects would not always be accessible, that would fix a lot of corner cases, while the "may not always be accessible" would still allow for type-based optimization. The notion that a compiler would magically have to "read a programmer's mind" to process `memcpy` meaningfully is a consequence of a broken abstraction where trivial objects' lifetimes begin and end separately from their storage. – supercat Sep 18 '20 at 22:24
  • If, for example, union `foo` contains struct members `s1` and `s2` with a common initial sequence, such a model would make clear what would be accessed by if code reads the lvalue `foo.s2.commonMember` after having written `foo.s1.commonMember`. The act of writing `foo.s1.commonMember` may render `foo.s2` inaccessible, but resolving lvalue `foo.s2` would make that *already-existing* member accessible without ending the lifetime of `foo.s1` nor making it inaccessible. – supercat Sep 18 '20 at 22:28

2 Answers2

8

It is clear to me that using std::memcpy results in neither construction nor assignment. It is not construction, since no constructor will be called. Nor is it assignment, as the assignment operator will not be called. Given that a trivially copyable object has trivial destructors, (copy/move) constructors, and (copy/move) assignment operators, the point is rather moot.

You seem to have quoted ¶2 from §3.9 [basic.types]. On ¶3, it states:

For any trivially copyable type T, if two pointers to T point to distinct T objects obj1 and obj2, where neither obj1 nor obj2 is a base-class subobject, if the underlying bytes (1.7) making up obj1 are copied into obj2,41 obj2 shall subsequently hold the same value as obj1. [ Example:
  T* t1p;
  T* t2p;
          // provided that t2p points to an initialized object ...
  std::memcpy(t1p, t2p, sizeof(T));
          // at this point, every subobject of trivially copyable type in *t1p contains
          // the same value as the corresponding subobject in *t2p
— end example ]
41) By using, for example, the library functions (17.6.1.2) std::memcpy or std::memmove.

Clearly, the standard intended to allow *t1p to be useable in every way *t2p would be.

Continuing on to ¶4:

The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T). The value representation of an object is the set of bits that hold the value of type T. For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values.42
42) The intent is that the memory model of C++ is compatible with that of ISO/IEC 9899 Programming Language C.

The use of the word the in front of both defined terms implies that any given type only has one object representation and a given object has only one value representation. Your hypothetical morphing internal type should not exist. The footnote makes it clear that the intention is for trivially copyable types to have a memory layout compatible with C. The expectation is then that even an object with non-standard layout, copying it around will still allow it to be useable.

jxh
  • 69,070
  • 8
  • 110
  • 193
  • 2
    `Given that a trivially copyable object has trivial destructors, constructors, and assignment operators` Only copy and move constructors are required to be trivial. Trivially copyable types can have non-trivial 'normal' constructors. You might be thinking of PODs, which can't have constructors, but which are a stricter superset of trivially copyable. – underscore_d Jul 15 '16 at 13:36
  • @jxh Well, I'm clarifying that whereas you said 'trivial constructors' without specifying which, only _copy_ and _move_ constructors must be trivial for trivially copyable status. Non-trivial constructors of 'normal' (I confess I'm not sure if there's an official term for this) i.e. non-copy/move signatures are allowed for trivially copyable types. It's aggregate and hence POD types that can't have _any_ non-trivial constructors. It's up to you whether you edit that into your answer, but I think it would be improved by doing so. – underscore_d Jul 15 '16 at 19:36
  • @jxh cool, and good catch on how assignment ops follow the same pattern; I instantly started talking about constructors and overlooked assignment! which is weird as I make heavy use of conversion assignment for trivially copyable types. If I could be pedantic, I'd take the brackets out, as some people might interpret that to mean 'including' rather than 'only' :) – underscore_d Jul 16 '16 at 08:45
2

In the same draft, you also find the following text, directly following the text you quoted:

For any trivially copyable type T, if two pointers to T point to distinct T objects obj1 and obj2, where neither obj1 nor obj2 is a base-class subobject, if the underlying bytes (1.7) making up obj1 are copied into obj2, obj2 shall subsequently hold the same value as obj1.

Note that this speaks about a change of the value of obj2, not about destroying the object obj2 and creating a new object in its place. Since not the object, but only its value is changed, any pointers or references to its members should therefore remain valid.

celtschk
  • 19,311
  • 3
  • 39
  • 64
  • This would imply that `Meow_internal` is not a Standard-compliant implementation of `Meow`. I agree with this interpretation. The consequence of this, though, is that the Standard's distinction between "trivially-copyable" and "standard-layout" is blurred a bit. As far as I can tell, `offsetof` ''must'' conceptually work with trivially-copyable types in addition to standard-layout types, or the implementation is demonstrably noncompliant for other reasons. – Myria May 15 '15 at 20:46