1

The c++ standard constrains pointer arithmetic to be performed within an array ([expr.add]) which makes implementation of vector-like containers difficult.

One could implement a vector-like container with an implementation similar to this:

//First approach

//Allocation
auto buffer = new unsigned char[2*sizeof(int)];
//Construction
auto p=new(buffer) int{};
new(p+1) int{};
//Example of use of an iterator, assign 10 to the second element.
*(p+1)=10;//UB p+1 is a pointer past the end of an object.

This previous peace of code illustrates how approximately std::vector is implemented in libstdc++ and libc++. It seems that compilers accept this kind of code as an extension to the c++ language.

If I want to be standard compliant I could implement a vector and its associated iterator in such a way that operations performed on the vector and its iterator could be simplified to this code:

//Second approach

//Allocation:
auto buffer = new unsigned char[2*sizeof(int)];
//Construction
new(buffer) int{};
new(buffer+sizeof(int)) int{};
//Example of use of an iterator assign 10 to the second element
*(std::launder(reinterpret_cast<int*>(buffer+sizeof(int))))=10;

(First question, is this approach not also UB? Here the pointer arithmetic is performed on the array of unsigned char which provides storage for the int objects. launder is used because buffer and the int objects are not pointer interconvertibles)

The problem with this second approach, is the code generated by the compiler (GCC):

#include <new>

int test_approach_1(unsigned char* buffer){
    //Construction
    auto p = new(buffer) int{};
    new(p+1) int{10};
    //Example of use of an iterator assign 10 to the second element
    *(p+1)=13;//UB
    return *(p+1);//UB
}

int test_approach_2(unsigned char* buffer){
    //Construction
    new(buffer) int{};
    new(buffer+sizeof(int)) int{10};
    //Example of use of an iterator assign 10 to the second element
    *(std::launder(reinterpret_cast<int*>(buffer+sizeof(int))))=13;
    return *(std::launder(reinterpret_cast<int*>(buffer+sizeof(int))));
}

Generated assembly:

test_approach_1(unsigned char*):
        movabs  rax, 55834574848
        mov     QWORD PTR [rdi], rax
        mov     eax, 13
        ret
test_approach_2(unsigned char*):
        movabs  rax, 42949672960
        mov     QWORD PTR [rdi], rax
        mov     eax, 13
        mov     DWORD PTR [rdi+4], 13
        ret

The code generated for test_approach_1 is optimal. So I think I will not use the second approach (And I would have one more reason not to use it if one shows it is also UB.)

I don't find documentation for these extensions to the language that allow us to implement vector-like containers using the first approach (it is UB according to the standard). Is there any documentation for it? On which compiler can I expect it to work and with which compiler flags?

Oliv
  • 17,610
  • 1
  • 29
  • 72
  • 1
    I'd go with the first approach. Even if it is UB. If you create an own vector implementation, you utilize this UB at only a few places. If this ever break, only then you'll need to take an action. At least, this is what I do with my own vector-like container (no problems so far). – geza Sep 29 '18 at 15:10
  • @geza Do you know if it works with MSVC? And if the code instrumented by sanitizers? – Oliv Sep 29 '18 at 15:15
  • The latest MSVC I use is 2015, it works with it. It will only break, if they add a special code into a compiler for `std::vector`, which is unlikely. Even if they break this technique, they need to add a compiler extension, to make `std::vector` still correct. And then, presumably, you can use that extension as well. Not a beautiful solution, but what can one do, if the language is broken in this regard? I don't have experiences with sanitizers, so I don't know whether it poses problems. – geza Sep 29 '18 at 15:24
  • 1
    It's not about compiler extensions, documented or otherwise. Undefined behavior means **only** that the language definition doesn't tell you what that code does. The standard library is intimately connected with the compiler that it ships with, and often takes advantage of known behavior of that compiler. – Pete Becker Sep 29 '18 at 16:47
  • On thing to be aware with such code is that you need to be aware of data alignment. – Phil1970 Sep 29 '18 at 16:56
  • 1
    @PeteBecker, This is the definition of "unspecified behavior". If piece of code with "undefined behavior" is executed, the entire program behavior is unspecified. The intent is to allow compiler to consider that undefined behavior code are never executed, in order to allow them more optimization. – Oliv Sep 29 '18 at 17:30
  • @Phil1970 Indeed, hopefully alignof(int)<=alignof(max_alignval_t) – Oliv Sep 29 '18 at 17:33
  • @Oliv -- not quite. Yes, I should have said that it means **only** that the language definition doesn't tell you what the program does. "Unspecified behavior" means that the compiler can choose among a number of (typically unspecified) possibilities. Such code is valid; code with undefined behavior is not. And, incidentally, the intent of undefined behavior is simply to not impose requirements. One **consequence** of that is that compilers can apply aggressive optimizations. But that wasn't a factor back in the olden days when "undefined behavior" was first introduced as a term of art. – Pete Becker Sep 29 '18 at 17:48
  • @Oliv Well, if used for a single type and a pointer allocated by new, alignment will probably not be an issue. But if you do something more complex than a single type like `auto buffer = new unsigned char[sizeof(int) + sizeof(double)];` and `new(p+1) double{10.1};` then it might not works. – Phil1970 Sep 29 '18 at 19:05
  • @PeteBecker @Oliv An undefined behavior does not imply that under a given compiler and a specific case that it won't works. One example would be `reinterpret_cast` between different function pointers types. – Phil1970 Sep 29 '18 at 19:34

0 Answers0