2

There are problems, where we need to fill buffers with mixed types. Two examples:

  • programming OpenGL/DirectX, we need to fill vertex buffers, which can have mixed types (which is basically an array of struct, but the struct maybe described by a run-time data)
  • creating a memory allocator: putting header/trailer information to the buffer (size, flags, next/prev pointer, sentinels, etc.)

The problem can be described like this:

  • there is an allocation function, which gives back some memory (new, malloc, OS dependent allocation function, like mmap or VirtualAlloc)
  • there is a need to put mixed types into an allocated buffer, at various offsets

A solution can be this, for example writing an int to an offset:

void *buffer = <allocate>;
int offset = <some_offset>;
char *ptr = static_cast<char*>(buffer);
*reinterpret_cast<int*>(ptr+offset) = int_value;

However, this is inconvenient, and has UB at least two places:

  • ptr+offset is UB, as there is no char array at ptr
  • writing to the result of reinterpret_cast is UB, as there is no int there

To solve the inconvenience problem, this solution is often used:

union Pointer {
    void *asVoid;
    bool *asBool;
    byte *asByte;
    char *asChar;
    short *asShort;
    int *asInt;

    Pointer(void *p) : asVoid(p) { }
};

So, with this union, we can do this:

Pointer p = <allocate>;
p.asChar += offset;
*p.asInt++ = int_value; // write an int to offset
*p.asShort++ = short_value; // then a short afterwards
// other writes here

This solution is convenient for filling buffers, but has further UB, as the solution uses non-active union members.

So, my question is: how can one solve this problem in a strictly standard conformant, and most convenient way? I mean, I'd like to have the functionality which the union solution gives me, but in a standard conformant way.

(Note: suppose, that we have no alignment issues here, alignment is taken care of by using proper offsets)

geza
  • 28,403
  • 6
  • 61
  • 135

1 Answers1

1

A simple (and conformant) way to handle these things is leveraging std::memcpy to move whatever values you need into the correct offsets in your storage area, e.g.

std::int32_t value;
char *ptr;
int offset;
// ...
std::memcpy(ptr+offset, &value, sizeof(value));

Do not worry about performance, since your compiler will not actually perform std::memcpy calls in many cases (e.g. small values). Of course, check the assembly output (and profile!), but it should be fine in general.

Acorn
  • 24,970
  • 5
  • 40
  • 69
  • Thanks for the response, but this solution has UB as well, as I understand. And it is far from being convenient. – geza Oct 13 '18 at 11:57
  • @geza It isn't UB: even if the standard had a defect, the intention of the standard is that this is not UB. Further, even if the standard ended up changing this, all hell would break loose and millions of C++ would lines suddenly cried out in terror :-) – Acorn Oct 13 '18 at 12:11
  • @geza As for convenience, I am not sure I understand. The `memcpy` is conceptually working as a simple assignment here. If you want, you can create wrappers to simplify it even further. – Acorn Oct 13 '18 at 12:15
  • So you say that it is a defect, and will be fixed in the future? Then it's OK for me, of course. – geza Oct 13 '18 at 12:16
  • @geza No, I didn't say that. – Acorn Oct 13 '18 at 12:17
  • Union is convenient. Look at how simple it is. To do the same with this solution, I need a memcpy, and a pointer adjustment, with something like `ptr += sizeof(...);`. It is much more to write. – geza Oct 13 '18 at 12:18
  • Then I don't understand you :) I'd like to have a standard conformant solution. I'll accept UB code only, if that UB won't be UB in the future (because it will be fixed in the standard). – geza Oct 13 '18 at 12:20
  • @geza Maybe you should explain what you don't understand; or why you think I said the standard had a defect. – Acorn Oct 13 '18 at 12:24
  • Sorry, I may not understand this sentence: "even if the standard had a defect, the intention of the standard is that this is not UB.". Currently, as I understand the standard, `ptr + offset` is UB. If that's not the intention, then this is clearly a defect, and should be fixed. – geza Oct 13 '18 at 12:32
  • But, this is just one point of my question. Even, if it have not been UB, your answer gives a complicated way to do the same as the union code does. I'd like to have a succinct, conveniently usable solution to this problem. – geza Oct 13 '18 at 12:36
  • @geza The sentence "even if the standard had a defect..." does not imply that the standard *has* a defect. I am only talking conditionally. In other words, assuming the standard had a defect (which I am not claiming it does), such defect would be probably corrected quite quickly; because the *intention* of the designers is not to ban such usage. The standard is not a perfect document, which is one of the reasons why DRs exist: sometimes what gets written is not what was intended to be written. – Acorn Oct 13 '18 at 12:42
  • @geza Regarding the second comment: I don't agree that this solution is "complicated". It is, to my knowledge, the preferred way to do so in C++. It is easy (every C++ programmer knows about `memcpy`, even C ones), it is succinct (it is a simple function call, come on), it is convenient (`std::memcpy` is in the standard, and it is also implemented by all compilers properly) and it is only needed if you are dealing with low-level buffer manipulation (which is not everyday's code). – Acorn Oct 13 '18 at 12:46
  • But that's a contradiction. If that's a defect, it should have been fixed a long time ago. It always has been this way, as far as I know. Even the C89 standard has it. It shouldn't be that hard to fix this problem (to add an exception for `char *`). – geza Oct 13 '18 at 12:50
  • It is complicated compared to the union solution. Yes, this is not everyday's code. But, there is code, where this kind of stuff is used very heavily. Using memcpy, sizeof, all over the place clearly worsens readability, increases code size, etc. – geza Oct 13 '18 at 12:52
  • @geza I am not sure what to say, I don't see any contradiction. If I understood correctly, you claim that the C++ standard(s) and even the C ones say that this is UB. If you believe so, you should try to get the committee(s) to clarify the C and C++ standard(s), instead of finding "new" ways of doing the same thing. – Acorn Oct 13 '18 at 16:39
  • @geza I actually think the `union` solution, *for this use case*, is more complicated: you need to define a new type and consider all the rules for unions, rather than simply copying into the right place your values. It is true that unions may be convenient in other cases that are currently UB, though. In any case: if you really think that unions (or something new) should be the preferred way to deal with this, you should make a proposal. – Acorn Oct 13 '18 at 16:44
  • The standard is pretty clear on this. There is nowhere stated that it is well-defined. Is there a char array where `ptr` points? No. Then `ptr+offset` is UB. Why would it be well defined? Can you cite something in the standard which says that `ptr+offset` is well-defined? We could just reason about why it should be well defined. But the standard doesn't say so. Or at least I cannot find it. – geza Oct 13 '18 at 16:45
  • Yes, it is more complicated, if we take the union itself in account. But it is a utility object, used a lot of places in my code. It makes code simpler. `*p.asInt++ = 42;` is (almost) the simplest possible. `memcpy` + `p += sizeof(int);` is a lot more to type, and harder to understand. I'm not saying that it is a bad solution, but it needs to be wrapped to some carefully designed class, to be easily usable (which is basically what I'm doing now...). – geza Oct 13 '18 at 16:50
  • @geza At the moment, that is your **opinion**, not a fact. You have already asked about [this](https://stackoverflow.com/questions/47498585/) and people debated it. If you really believe the standard is wrong, you should submit an issue to the CWG, not open questions in StackOverflow. People is telling you what the standard was **intended to mean**, which is basically the only thing we can do here. Now, it is unclear to me what are you trying to accomplish. – Acorn Oct 13 '18 at 16:54
  • Just a comment: unfortunately, we should worry about performance. For example, gcc below armv6 doesn't optimize away `memcpy`. Not even in the case if I give `int *` parameters to it. So using memcpy there has a serious performance degradation. – geza Oct 13 '18 at 16:54
  • Yes, but an opinion supported by facts. Again, if you think that it is well defined, where does the standard say so? Yes, they've debated it. But no one, who said it is not UB, could say anything convincing. That's why I treat it as UB. (But of course, I'd like this to be **not** UB) – geza Oct 13 '18 at 17:00
  • @geza Opinions, supported or not by facts, are still opinions. Further, whether you think this is or not UB is irrelevant. What matters is what is **intended** in the standard and, in the end, what major compilers implement (which, by the way, are typically members of the C and C++ committees). I will stop answering you, because it seems you are keen on discussing this forever for no productive reason. – Acorn Oct 13 '18 at 17:09
  • Yes, it is not productive, I agree :) I've found this: https://groups.google.com/a/isocpp.org/forum/#!msg/std-discussion/bsb8okPgDak/aBqzrLJoAgAJ. So they already know about this. Unfortunately, I don't have the time to read through the whole thread right now. But as Melissa says: "those pointer additions are undefined behavior". And I cannot imagine, why would anyone think that it is not UB. It is UB, fact, not opinion. And it is a bad thing. – geza Oct 13 '18 at 17:18