9

I know this is a pretty common subject, but as much as the typical UB is easy to find, I did not find this variant so far.

So, I am trying to formally introduce Pixel objects while avoiding an actual copy of the data.

Is this valid?

struct Pixel {
    uint8_t red;
    uint8_t green;
    uint8_t blue;
    uint8_t alpha;
};

static_assert(std::is_trivial_v<Pixel>);

Pixel* promote(std::byte* data, std::size_t count)
{
    Pixel * const result = reinterpret_cast<Pixel*>(data);
    while (count-- > 0) {
        new (data) Pixel{
            std::to_integer<uint8_t>(data[0]),
            std::to_integer<uint8_t>(data[1]),
            std::to_integer<uint8_t>(data[2]),
            std::to_integer<uint8_t>(data[3])
        };
        data += sizeof(Pixel);
    }
    return result; // throw in a std::launder? I believe it is not mandatory here.
}

Expected use pattern, heavily simplified:

std::byte * buffer = getSomeImageData();
auto pixels = promote(buffer, 800*600);
// manipulate pixel data

More specifically:

  • Does this code have well-defined behavior?
  • If yes, does it make it safe to use the returned pointer?
  • If yes, to what other Pixel types can it be extended? (relaxing the is_trivial restriction? pixel with only 3 components?).

Both clang and gcc optimize out the whole loop to nothingness, which is what I want. Now, I'd like to know whether this violates some C++ rules or not.

Godbolt link if you want to play around with it.

(note: I did not tag c++17 despite std::byte, because the question holds using char)

curiousguy
  • 8,038
  • 2
  • 40
  • 58
spectras
  • 13,105
  • 2
  • 31
  • 53
  • Pretty sure this is UB. There is no actual `Pixel[]` in the memory `pixels` points to so you can't index it. Not sure if launder would fix that or not. – NathanOliver Jan 14 '20 at 16:23
  • The whole point of the loop is precisely to create `Pixel`s with placement new, to formally start their lifetime. – spectras Jan 14 '20 at 16:24
  • 2
    But contiguous `Pixel`s placed new is still not an array of `Pixel`s. – Jarod42 Jan 14 '20 at 16:26
  • 1
    @spectras That doesn't make an array though. You just have a bunch of Pixel objects next to each other. That's different from an array. – NathanOliver Jan 14 '20 at 16:27
  • Which is okay I guess, at no point it's used as an array. It returns a pointer to the first `Pixel` of a bunch of `Pixel` next to each other. Is there any use case where that would invoke UB? – spectras Jan 14 '20 at 16:28
  • 1
    So no where do you do `pixels[some_index]` or `*(pixels + something)`? That would be UB. – NathanOliver Jan 14 '20 at 16:29
  • Hmm we're on to something there. However, I don't see what specifically makes it undefined, as there is an actual Pixel object at that address. I am not too familiar with the exact standardese of pointer arithmetics though. – spectras Jan 14 '20 at 16:32
  • 1
    The relevant section is [here](https://timsong-cpp.github.io/cppwp/expr.add#4.2) and the key phrase is *if P points to an array element i of an array object x*. Here `pixels` (P) is not a pointer to array object but a pointer to a single `Pixel`. That means you can only access `pixels[0]` legally. – NathanOliver Jan 14 '20 at 16:35
  • Okay, that's the link I was missing to make full sense of it. Thanks! Make that into an answer and I'll accept it ^^ – spectras Jan 14 '20 at 16:36
  • @spectras Answer added detailing this. – NathanOliver Jan 14 '20 at 16:42
  • 3
    You want to read http://wg21.link/P0593 . – ecatmur Jan 14 '20 at 16:42
  • 1
    No a day without P0593-related (duplicate) question. – Language Lawyer Jan 15 '20 at 03:40
  • One cannot seriously claim that the std is to be applied strictly. That position makes no sense what so ever, even discounting MT programming. – curiousguy Jan 16 '20 at 03:46

2 Answers2

3

It is undefined behavior to use the result of promote as an array. If we look at [expr.add]/4.2 we have

Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i+j of x if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) array element i−j of x if 0≤i−j≤n.

we see that it requires the pointer to actually point to an array object. You don't actually have an array object though. You have a pointer to a single Pixel that just happens to have other Pixels following it in contiguous memory. That means the only element you can actually access is the first element. Trying to access anything else would be undefined behavior because you are past the end of the valid domain for the pointer.

NathanOliver
  • 171,901
  • 28
  • 288
  • 402
  • Thanks for finding out that quick. I'll make an iterator instead I guess. As a sidenote, this also means that `&somevector[0] + 1` is UB (well, I mean, using the resulting pointer would be). – spectras Jan 14 '20 at 16:44
  • @spectras That's actually okay. You can always get the pointer to one past an object. You just can't dereference that pointer, even if there is a valid object there. – NathanOliver Jan 14 '20 at 16:46
  • Yes, I edited the comment to make myself clearer, I meant dereferencing the resulting pointer :) Thanks for confirming. – spectras Jan 14 '20 at 16:47
  • @spectras No problem. This part of C++ can be very difficult. Even though the hardware will do what we want it to do, that's not actually what were a coding to. We are coding to the C++ abstract machine and it's a persnickety machine ;) Hopefully P0593 will get adopted and this will become much easier. – NathanOliver Jan 14 '20 at 16:51
  • Hehe I know, this is why I always double check those with people, not with tools. It is about abiding the rules of the C++ contract to be protected from unexpected things down the line :) P0593 looks interesting indeed, I'm a bit sad I have to have an iterator for such simple thing (happily though, it also compiles to no overhead). – spectras Jan 14 '20 at 16:54
  • 1
    @spectras No, because a std vector is defined as containing an array, and you can do pointer arithmetic between array elements. There is no way to implement std vector in C++ itself, sadly, without running into UB. – Yakk - Adam Nevraumont Jan 14 '20 at 17:11
1

You already have an answer regarding the limited use of the returned pointer, but I want to add that I also think you need std::launder to even be able to access the first Pixel:

The reinterpret_cast is done before any Pixel object is created (assuming you don't do so in getSomeImageData). Therefore reinterpret_cast will not change the pointer value. The resulting pointer will still point to the first element of the std::byte array passed to the function.

When you create the Pixel objects, they are going to be nested within the std::byte array and the std::byte array will be providing storage for the Pixel objects.

There are cases where reuse of storage causes a pointer to the old object to automatically point to the new object. But this is not what is happening here, so result will still point to the std::byte object, not the Pixel object. I guess using it as if it was pointing to a Pixel object is technically going to be undefined behavior.

I think that this still holds, even if you do the reinterpret_cast after creating the Pixel object, since the Pixel object and the std::byte that provides storage for it are not pointer-interconvertible. So even then the pointer would keep pointing to the std::byte, not the Pixel object.

If you obtained the pointer to return from the result of one of the placement-new, then everything should be ok, so far as access to that specific Pixel object is concerned.


Also you need to make sure that the std::byte pointer is suitably aligned for Pixel and that the array really is large enough. As far as I remember the standard does not really require that Pixel has the same alignment as std::byte or that it doesn't have padding.


Also none of this depends on Pixel being trivial or really any other property of it. Everything would behave the same way as long as the std::byte array is of sufficient size and suitably aligned for the Pixel objects.

walnut
  • 21,629
  • 4
  • 23
  • 59
  • I believe that's correct. Even if the array thing (unimplementability of `std::vector`) was not a problem, you'd still need to `std::launder` the result before accessing any of the placement-`new`ed `Pixel`s. As of right now, `std::launder` here is UB, since the adjacent `Pixel`s would be reachable from *laundered* pointer. – Fureeish Jan 14 '20 at 18:36
  • @Fureeish I am not sure why `std::launder` would be UB if applied to `result` before returning. The adjacent `Pixel` is not "*reachable*" through the laundered pointer going by my understanding of https://eel.is/c++draft/ptr.launder#4. And even it was I don't see how it is UB, because the whole original `std::byte` array is *reachable* from the original pointer. – walnut Jan 14 '20 at 18:45
  • But the next `Pixel` will not be reachable from `std::byte` pointer, but it is from the `launder`ed pointer. I believe [this](https://en.cppreference.com/w/cpp/utility/launder#Notes) is relevant here. I am happy to be corrected, though. – Fureeish Jan 14 '20 at 18:47
  • @Fureeish From what I can tell none of the examples given apply here and the definition of the requirement also says the same as the standard. Reachability is defined in terms of bytes of storage, not objects. The byte occupied by the next `Pixel` seems reachable to me from the original pointer, because the original pointer points to an element of the `std::byte` array which contains the bytes making up the storage for the `Pixel` making the "*or within the immediately enclosing array of which Z is an element*" condition apply (where `Z` is `Y`, i.e. the `std::byte` element itself). – walnut Jan 14 '20 at 19:00
  • I think that the bytes of storage that the next `Pixel` occupies are not reachable through the laundered pointer though, because the pointed-to `Pixel` object is not element of an array object and is also not pointer-interconvertible with any other relevant object. But I am also thinking about this detail of `std::launder` for the first time in that depth. I am not 100% sure about this either. – walnut Jan 14 '20 at 19:01