8

Can this potentially cause undefined behaviour?

uint8_t storage[4];

// We assume storage is properly aligned here.
int32_t* intPtr = new((void*)storage) int32_t(4);

// I know this is ok:
int32_t value1 = *intPtr;
*intPtr = 5;

// But can one of the following cause UB?
int32_t value2 = reinterpret_cast<int32_t*>(storage)[0];
reinterpret_cast<int32_t*>(storage)[0] = 5;

char has special rules for strict-aliasing. If I use char instead of uint8_t is it still Undefined Behavior? What else changes?

As member DeadMG pointed out, reinterpret_cast is implementation dependent. If I use a C-style cast (int32_t*)storage instead, what would change?

rsp1984
  • 1,877
  • 21
  • 23
  • While I'm not sure about the first one (even though it should work on all sane environments if it is aligned), the second one I believe does violate strict-aliasing. IIRC, you can alias a char array with anything. But you can't alias other things to a char array. (leaving aside the fact that `uint8_t` may or not may not follow the same rules as `char`.) – Mysticial Jan 08 '14 at 00:30
  • Could the pointer `reinterpret_cast(storage)` potentially have a different address than `intPtr`? Or put another way: Is placement-new required to return the pointer that it's been given as an argument? – rsp1984 Jan 08 '14 at 00:37
  • All of the lines after you place into the storage can cause UB by strict aliasing violation. You should look into std::aligned_storage. To be more accurate, placement new only returns the pointer... it's your job to handle the aliasing violations or lack thereof of that pointer. – Puppy Jan 08 '14 at 00:37
  • @DeadMG: How would one implement memory pools then without violating strict aliasing? – rsp1984 Jan 08 '14 at 00:39
  • You need to use `char` directly, or preferably, std::aligned_storage. – Puppy Jan 08 '14 at 00:39
  • So char is in fact treated differently than uint8_t then? – rsp1984 Jan 08 '14 at 00:40
  • Quite certainly, `char` has special rules. – Puppy Jan 08 '14 at 00:40
  • Interesting. I've edited the question accordingly. Thanks. – rsp1984 Jan 08 '14 at 00:41
  • Could you please roll that back? You've totally changed the meaning of the question. – Puppy Jan 08 '14 at 00:43
  • I don't think so. Could you elaborate on how exactly it did change the question? – rsp1984 Jan 08 '14 at 00:45
  • Well, I just wrote an answer about how your code is broken because you didn't use `char`, which is now totally meaningless. – Puppy Jan 08 '14 at 00:46
  • Hang on. I know a way to make everybody happy. – Mysticial Jan 08 '14 at 00:46
  • Oh. Ok, I didn't see your answer. But I'm not sure the answer was there before my comment (that I changed the question). Given the time it takes to write the answer I am afraid the answer post was second and the comment/change was first... – rsp1984 Jan 08 '14 at 00:46

2 Answers2

7

The pointer returned by placement new can be just as UB-causing as any other pointer when aliasing considerations are brought into it. It's your responsibility to ensure that the memory you placed the object into isn't aliased by anything it shouldn't be.

In this case, you cannot assume that uint8_t is an alias for char and therefore has the special aliasing rules applied. In addition, it would be fairly pointless to use an array of uint8_t rather than char because sizeof() is in terms of char, not uint8_t. You'd have to compute the size yourself.

In addition, reinterpret_cast's effect is entirely implementation-defined, so the code certainly does not have a well-defined meaning.

To implement low-level unpleasant memory hacks, the original memory needs to be only aliased by char*, void*, and T*, where T is the final destination type- in this case int, plus whatever else you can get from a T*, such as if T is a derived class and you convert that derived class pointer to a pointer to base. Anything else violates strict aliasing and hello nasal demons.

Puppy
  • 144,682
  • 38
  • 256
  • 465
  • I would appreciate if you could elaborate on the last paragraph a little more as I understand it that, apart from the reinterpret_cast and the uint8_t issues that you pointed out (and which both are not at the core of the question), the posted code would not cause UB. Correct? – rsp1984 Jan 08 '14 at 01:00
  • 3
    Surely when `uint8_t` exists then `CHAR_BIT` is `8` and `uint8_t` _must_ be an alias for `unsigned char`? Why would you have to compute its size yourself? – Lightness Races in Orbit Jan 08 '14 at 01:54
  • 1
    There's no requirement that uint8_t must be an alias for unsigned char. If you had a 4-bit-word machine, uint8_t could be a double-word type. – Puppy Jan 08 '14 at 02:06
  • @DeadMG: Ah well I suppose if `CHAR_BIT` were _smaller_ than `8`, yeah. Still, I'd stick `assert(CHAR_BIT==8)` at the top of your program and be done with it because, let's face it, when's the last time you saw a 4-bit-word machine? (Did you really mean 4-bit-word? Word size is not the same as char size) Then with that assertion in place we can remove all the uncertainty surrounding this. – Lightness Races in Orbit Jan 08 '14 at 02:12
  • Also, I'm not exactly sure on the wording for uint8_t, but I guess it's possible for a larger-than-that CHAR_BIT to provide it for compatibility, even if the actual size of that integral type is 9 bits. – Puppy Jan 08 '14 at 02:16
  • @LightnessRacesinOrbit The C++ Standard refers to the [C Standard](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf) for the `CHAR_BIT` definition, and there it is specified that this value is *at least 8*. `[intro.memory]/1` also specifies a *byte* to hold at least 8-bits and specifies that memory is a contiguous sequence of bits. – TemplateRex Jan 08 '14 at 07:54
  • @LightnessRacesinOrbit The C Standard also defines `uint8_t`: "The typedef name uintN_t designates an unsigned integer type with width N and no padding bits." For `CHAR_BIT==9` platforms, this means that they would have to resort to the `uint_least8_t` typedefs. – TemplateRex Jan 08 '14 at 08:03
  • 1
    @TemplateRex: It means that `uint8_t` would not exist on such platforms, which I covered as a qualifying condition in the first four words of my first comment :) So, if `CHAR_BIT` must be at least 8, then, if `uint8_t` exists, it _must_ alias `unsigned char`. – Lightness Races in Orbit Jan 08 '14 at 15:20
  • @DeadMG: They're exact width types. – Lightness Races in Orbit Jan 08 '14 at 15:22
  • @TemplateRex: Except `UINT_MAX` also has to be at least `+65535` (`[C99: 5.2.4.2.1]`), which is tricky to implement with only eight bits. – Lightness Races in Orbit Jan 08 '14 at 15:26
  • 1
    Nitpick: saying that you can alias with `void*` is somewhat imprecise as you cannot dereference `void*`. Worse, it may mislead in thinking that casting around via `void*` respects aliasing rules — very much relevant in this case as that’s what the `reinterpret_cast`s of the OP perform. – Luc Danton Jan 10 '14 at 04:37
  • @LightnessRacesinOrbit: A conforming compiler could regard `uint8_t` as an extended integer type whose size is 1, but which is not regarded as a character type for purpose of the aliasing rules. If the Standard had from the beginning defined e.g. `uint8a_t`, which was like `uint8_t` but was guaranteed to support free aliasing, and had explicitly stated that code which needed aliasing must use the "a" forms, making `uint8_t` be an 8-bit type that didn't support aliasing would have been genuinely useful. – supercat Sep 14 '16 at 18:55
  • @LucDanton: Unfortunately, the Standard doesn't specify any "concise" name for a type which behaves like `unsigned char` on platforms where that type is 8 bits, so making `uint8_t` behave as a non-character type would break a lot of code. – supercat Sep 14 '16 at 18:57
6

Your version using the usual placement new is indeed fine.

There is an interpretation1 of §§ 3.8/1 and 3.8/4 where objects of trivial types are able to ‘vanish’ and ‘appear’ on demand. This not a free pass that allows disregarding aliasing rules, so notice:

std::uint16_t storage[2];
static_assert( /* std::uint16_t is not a character type */ );
static_assert( /* storage is properly aligned for our purposes */ );

auto read = *reinterpret_cast<std::uint32_t*>(&storage);
// At this point either we’re attempting to read the value of an
// std::uint16_t object through an std::uint32_t glvalue, a clear
// strict aliasing violation;
// or we’re reading the indeterminate value of a new std::uint32_t
// object freshly constructed in the same storage without effort
// on our part

If on the other hand you swapped the casts around in your second snippet (i.e. reinterpret and write first), you’re not entirely safe either. While under the interpretation you can justify the write to happen on a new std::uint32_t object that reuses the storage implicitly, the subsequent read is of the form

auto value2 = *reinterpret_cast<int32_t*>(storage);

and §3.8/5 says (emphasis mine and extremely relevant):

[…] after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that refers to the storage location where the object will be or was located may be used but only in limited ways. […] such a pointer refers to allocated storage (3.7.4.2), and using the pointer as if the pointer were of type void*, is well-defined.

§3.8/6 is the same but in reference/glvalue form (arguably more relevant since we’re reusing a name and not a pointer here, but the paragraph is imo harder to understand out of context). Also see §3.8/7, which gives some limited leeway that I don’t think applies in your case.

To make things simpler, the remaining problem is this:

T object;
object.~T();
new (&object) U_thats_really_different_from_T;
&object;                     // Is this allowed? What does it mean?
static_cast<void*>(&object); // Is this?

As it so happens if the type of the storage happens to involve a plain or unsigned character type (e.g. your storage really has type unsigned char[4]) then I’d say you have a basis to justify forming a pointer/reference to the storage of the new object (possibly to be reinterpreted later). See e.g. ¶¶ 5 and 6 again, which have an explicit escape clause for forming a pointer/reference/glvalue and §1.8 The C++ object model that describes how an object involves a constituent array of bytes. The rules governing the pointer conversions should be straightforward and uncontroversial (at least by comparison…).


1: it’s hard to gauge how well this interpretation is received in the community — I’ve seen it on the Boost mailing list, where there was some scepticism towards it

Luc Danton
  • 34,649
  • 6
  • 70
  • 114
  • In your last code example, the address-of operator `&object` and its `static_cast` are certainly allowed and defined. The question I guess is rather what happens when you try to dereference these pointers. – rsp1984 Jan 08 '14 at 16:08
  • @RafaelSpring Strictly speaking (and as mentioned in a parenthical), the problem is not the operator but the use of the name `object` which refers to a non-existing object which storage has been reused. What basis do you use to justify that such an use is allowed, and what are its semantics (in particular if/when a pointer is formed from that name *and* used, not just discarded). – Luc Danton Jan 08 '14 at 20:59
  • Now it’s true a lot of my argumentation relies on showing that the various paragraphs of 3.8 do not allow for what we’re trying to achieve — unfortunaly that’s par for the course when it comes to explaining a negative with a negative (i.e. ‘not allowed by virtue of not being in the text’). Here’s a supporting argument that hints as to why we should refer to 3.8 in the first place: according to §5.3.1/3 we need an expression that refers to a ‘designated’ object to use the address-of operator. Which is it in our case? – Luc Danton Jan 08 '14 at 21:08
  • My interpretation is that `T object` allocates memory for `T` on the stack and constructs an object of type `T` into that memory. `object.~T()` then destroys this object but does not de-allocate its memory. No line after the dtor call tries to access the stored values in `object` so I guess line 4 and 5 are fine. Validity of line 3 probably depends on whether size and alignment requirements are met. What may still be problematic is what happens when `object` goes out of scope. Not sure whether the dtor would be called again... – rsp1984 Jan 10 '14 at 15:32
  • @RafaelSpring You do realize I am quoting the Standard, and I am not guessing? According to it, once you call a destructor, there is no more object or value to access — that much is pretty clear from §1.8 and §3.8. – Luc Danton Jan 10 '14 at 22:12
  • I think that you are over-interpreting the standard. Note that §3.8 defines "lifetime" as a "runtime property of the object". Therefore an object that is not alive any more still exists and is still an object - the fact that it's dead just limits what you can do with it. Moreover in §5.3.1.3 the standard says "designated", not "alive", so I don't see why my interpretation would be wrong. – rsp1984 Jan 11 '14 at 00:24
  • 1
    @RafaelSpring If you read the paragraph 1 that you’re quoting to the end, it unambiguously says: ‘The lifetime of an object of type T ends when: […] the storage which the object occupies is reused or released.’ Paragraph 3 elaborates: ‘The properties ascribed to objects throughout this International Standard apply for a given object only during its lifetime.’ (accompanied by a nice non-normative note). Most there is to say about an object does end once its storage is reused — as to why what *doesn’t* isn’t of help, that’s the essence of my answer. – Luc Danton Jan 11 '14 at 01:30