2

Accessing objects via reinterpret_casted pointers and related UB has been extensively discussed here. After reading questions and answers, I'm still not sure about proper using uninitialized memory with POD types.

Suppose I want to "emulate"

struct { double d; int i; };

by manually allocating memory for data members and suppose (for simplicity) that no padding is needed before i.

Now, I do this:

// (V1)
auto buff = reinterpret_cast<char*>(std::malloc(sizeof(double) + sizeof(int)));
auto d_ptr = reinterpret_cast<double*>(buff);
auto i_ptr = reinterpret_cast<int*>(buff + sizeof(double));
*d_ptr = 20.19;
*i_ptr = 2019;

First question: is this code valid?

I could use placement new:

// (V2)
auto buff = reinterpret_cast<char*>(std::malloc(sizeof(double) + sizeof(int)));
auto d_ptr = new(buff) double;
auto i_ptr = new(buff + sizeof(double)) int;
*d_ptr = 20.19;
*i_ptr = 2019;

Do I have to? Placement new seems to be redundant here because default initialization of POD types is no-op (vacuous initialization), and [basic.life] reads:

The lifetime of an object of type T begins when:

(1.1) storage with the proper alignment and size for type T is obtained,

(1.2) if the object has non-vacuous initialization, its initialization is complete, ...

Does this say that the lifetime of *d_ptr and *i_ptr objects began once I had allocated memory for them?

Second question: can I use type double* (or some T*) for buff, i.e.

// (V3)
auto buff = reinterpret_cast<double*>(std::malloc(sizeof(double) + sizeof(int)));
auto d_ptr = reinterpret_cast<double*>(buff);
auto i_ptr = reinterpret_cast<int*>(buff + 1);
*d_ptr = 20.19;
*i_ptr = 2019;

or

// (V4)
auto buff = reinterpret_cast<double*>(std::malloc(sizeof(double) + sizeof(int)));
auto d_ptr = new(buff) double;
auto i_ptr = new(buff + 1) int;
*d_ptr = 20.19;
*i_ptr = 2019;

?

Evg
  • 25,259
  • 5
  • 41
  • 83
  • Use of `reinterpret_cast` is almost always a bug. – Jesper Juhl Jun 04 '19 at 19:55
  • 2
    @JesperJuhl, but I still want to know how do it correctly if I really have to. – Evg Jun 04 '19 at 19:56
  • [Dupe](https://stackoverflow.com/q/51572788/2069064)? – Barry Jun 04 '19 at 19:58
  • @Barry, half-dupe. Does your answer there imply that (V2) is correct? And what about (V4)? – Evg Jun 04 '19 at 20:03
  • V4 is more correct than V2. Practically, it doesn't matter. – Barry Jun 04 '19 at 20:06
  • 1
    There is no way to properly alias pointers in C++ currently. Sad but true. Your placement new is correct, but it won't work if there is already a value in the storage and you need to access it. – SergeyA Jun 04 '19 at 20:06
  • 1
    @SergeyA: It's sad to think how much better C and C++ could have been if the Standard had included a statement "To allow compilers intended for specialized purposes to best serve those purposes, this Standard allows them to behave in ways that would make them unsuitable for many others. It makes no attempt to forbid compilers from behaving in stupid and useless ways that would make them unsuitable for most or even all purposes, but quality compilers should of course be expected to refrain from doing so anyhow." Most arguments about the Standard could be resolved by saying... – supercat Jun 04 '19 at 20:31
  • ...that the Standard may allow a compiler that doesn't need to be suitable for a particular purpose to behave in a particular fashion, but the question of whether such a compiler would be suitable for any particular purpose would depend upon that purpose--not the Standard. – supercat Jun 04 '19 at 20:31
  • 5
    @supercat What... does any of that even mean and what is it intended to accomplish? – Barry Jun 04 '19 at 20:39
  • 1
    @supercat I am not sure if I follow. What would be the purpose of this statement? – SergeyA Jun 04 '19 at 20:59
  • @Evg I am not sure what is the valid use-case that prompted this? Here is my C++ that is also C like: https://wandbox.org/permlink/ZlBTf1KZQGhhgHXJ ... am I missing someting here? – Chef Gladiator Jun 04 '19 at 21:03
  • @Barry: Most ambiguities in the Standards revolve around questions of whether it requires implementations to process various constructs in meaningful fashion or merely allows them to do so in cases where that so would be useful. Compiler writers seem to believe that a failure to mandate a meaningful behavior, however, implies that they should feel no obligation to support it even when their customers would find it useful. What's needed is a clear statement that failure to mandate that *all* implementations support some behavior doesn't mean that most shouldn't support it *anyway*. – supercat Jun 04 '19 at 21:06
  • 5
    @supercat I'm going to need to see some serious justification for this idea that compiler writers don't care about what their customers would find useful. – Barry Jun 04 '19 at 21:15
  • @ChefGladiator, this question originates from this one: https://stackoverflow.com/questions/56434495/mpi-derived-data-type-for-a-struct-with-flexible-size. If some code works, it doesn't mean there is no UB in it. And if there is UB, weird things may happen in future. `reinterpret_cast`s like those in the question do work in rather reliable ways. But even if a solution is OK FAPP, I still want to know a correct one. So this question is probably more theoretical than practical, that's why I added a "language-lawyer" tag. – Evg Jun 04 '19 at 21:18
  • @Barry: For what customers would a compiler that can't recognize that something like `actOnMember1(&someUnion->member1);` might access a member `member1` of `*someUnion`, *even in cases where all accesses would be via the passed pointer, and the pointer isn't used after the function returns*, be preferable to one that can recognize such possibilities in such cases? – supercat Jun 04 '19 at 21:26
  • 4
    @supercat No idea what that example even means. – Barry Jun 04 '19 at 21:37
  • 3
    @Barry in supercat's world, a "quality compiler" is one that implements some alternative language specification that he has in mind but has never been able to successfully articulate – M.M Jun 05 '19 at 01:00
  • @Barry Some compiler writers seem more obsessed by finding optimisable special cases than provide predictable behavior at least for common code. – curiousguy Jun 05 '19 at 15:11
  • 2
    @curiousguy And some compiler writers are obsessed with predictable behavior for common code. Meaningless generalizations are meaningless. – Barry Jun 05 '19 at 15:33

2 Answers2

5

As Barry states better here, 1&3 are UB. The short version: none of those pieces of code contain any of the syntax needed to create an object. And you can't access the value of an object that isn't there.

So, do 2 and 4 work?

#2 works if and only if alignof(double) >= alignof(int). But it only works in the sense that it create a double followed by an int. It does not in any way "emulate" that nameless struct. The struct could have any arbitrary amount of padding, while in this case, the int will immediately follow the double.

#4 does not work, strictly speaking. buff does not actually point to the newly created double. As such, pointer arithmetic cannot be used to get the byte after that object. So doing pointer arithmetic yields undefined behavior.

Now, we are talking about C++ strictly speaking. In all likelihood, every compiler will execute all four of these (with the above caveat about alignment).

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • "_any of the syntax needed to create an object_" But neither does `cout<<"hello world";` yet you believe that one is OK. So you aren't very consistent with whether you take intro.object/1 seriously and literally (taking it either literally, seriously or both is an insane position IMNSHO). – curiousguy Jun 05 '19 at 23:39
1

When I look at publicly available draft, http://eel.is/c++draft/basic.life the quote is different, and it says that

The lifetime of an object of type T begins when:

(1.1) storage with the proper alignment and size for type T is obtained, and

(1.2) its initialization (if any) is complete (including vacuous initialization) ([dcl.init]),

Since there was no vacuous initialization of the double variable, I believe the code is incorrect and invokes undefined behavior.

Community
  • 1
  • 1
SergeyA
  • 61,605
  • 5
  • 78
  • 137
  • 1
    What in that quote (or anywhere) suggests that any of OP's code is ill-formed? – Barry Jun 04 '19 at 21:15
  • 1
    (Some of) the code is undefined behaviour, not ill-formed – M.M Jun 05 '19 at 01:23
  • @M.M isn't code which exhibits undefined behavior makes program ill-formed? – SergeyA Jun 05 '19 at 13:03
  • 1
    @SergeyA No. Ill-formed programs must give a diagnostic and the compiler is allowed to not compile them. – M.M Jun 06 '19 at 01:22
  • @SergeyA In general ill formed code involving some violation language rules on template definition (not instantiation) and stuff having to do with linking need not be diagnosed but most other cases of ill formed code require a diagnostic. Well formed code can have many execution (depending on the environnement). That **some executions** lead to UB doesn't make the program ill formed. – curiousguy Jun 06 '19 at 03:30