0

I have the following function in the interface of some module:

void DoSomething(Span<MyObject *const> objects);

, where Span is my simplified implementation of the C++20's std::span template.

This function just iterates over a contiguous sequence of pointers to objects and calls some of their functions, without attempting to modify the pointers (thus the const in the signature).

On the caller's side, I have a std::vector<std::unique_ptr<MyObject>>. And I want to pass that vector to the DoSomething function without allocating additional memory (for anything like a temporary std::vector<MyObject*>). I just want to convert an lvalue vector of unique_ptrs to a Span of immutable raw pointers in constant time.

It must be possible, because a std::unique_ptr<T> with a stateless deleter has the same size and alignment as a raw T* pointer, and all it stores inside is nothing but that raw pointer itself. So, bytewise, std::vector<std::unique_ptr<MyObject>> must have the same representation as std::vector<MyObject*> -- thus it must be possible to pass it to a function which expects a Span<MyObject *const>.

My question is:

  1. Is such a cast possible with the current proposal of std::span without causing undefined behavior and relying on dirty hacks?

  2. If it's not, could it be expected in the following standards (e.g., C++23)?

  3. What are the dangers of using a cast that I implemented in my version of Span, using a dirty trick with memcpy? It seems to work fine in practice, but I suppose there might be some undefined behavior in it. If there is, in which cases can that undefined behavior shoot me in the foot on MSVC, GCC or Clang/LLVM, and how exactly? I would be grateful for some real examples of such scenarios, if they are possible.

My code goes like this:

namespace detail
{
  constexpr std::size_t dynamic_extent = static_cast<std::size_t>(-1);

  template<typename SourceSmartPointer, typename SpanElement, typename = void>
  struct is_smart_pointer_type_compatible_impl
    : std::false_type
  {
  };

  template<typename SourceSmartPointer, typename SpanElement>
  struct is_smart_pointer_type_compatible_impl<SourceSmartPointer, SpanElement,
                                               decltype((void)(std::declval<SourceSmartPointer&>().get()))>
    : std::conjunction<
        std::is_pointer<SpanElement>,
        std::is_const<SpanElement>,
        std::is_convertible<std::add_pointer_t<decltype(std::declval<SourceSmartPointer&>().get())>,
                            SpanElement*>,
        std::is_same<std::remove_cv_t<std::remove_pointer_t<decltype(std::declval<SourceSmartPointer&>().get())>>,
                     std::remove_cv_t<std::remove_pointer_t<SpanElement>>>,
        std::bool_constant<(sizeof(SourceSmartPointer) == sizeof(SpanElement)) &&
                           (alignof(SourceSmartPointer) == alignof(SpanElement))>>
  {
  };

  // Helper type trait which detects whether a contiguous range of smart pointers of the source type
  // can be used to initialize a span of respective immutable raw pointers using a memcpy-based hack.
  template<typename SourceSmartPointer, typename SpanElement>
  struct is_smart_pointer_type_compatible
    : is_smart_pointer_type_compatible_impl<SourceSmartPointer, SpanElement>
  {
  };

  template<typename T, typename R>
  inline T* cast_smart_pointer_range_data_to_raw_pointer(R& source_range)
  {
    T* result = nullptr;

    auto* source_range_data = std::data(source_range);
    std::memcpy(&result, &source_range_data, sizeof(T*));

    return result;
  }
}

template<typename T, std::size_t Extent = detail::dynamic_extent>
class Span final
{
public:
  // ...

  // Non-standard extension.
  // Allows, e.g., to convert `std::vector<std::unique_ptr<Object>>` to `Span<Object *const>`
  // by using the fact that such smart pointers are bytewise equal to the resulting raw pointers;
  // `const` is required on the destination type to ensure that the source smart pointers
  // will be read-only for the users of the resulting Span.
  template<typename R,
           std::enable_if_t<std::conjunction<
             std::bool_constant<(Extent == detail::dynamic_extent)>,
             detail::is_smart_pointer_type_compatible<std::remove_reference_t<decltype(*std::data(std::declval<R&&>()))>, T>,
             detail::is_not_span<R>,
             detail::is_not_std_array<R>,
             std::negation<std::is_array<std::remove_cv_t<std::remove_reference_t<R>>>> >::value, int> = 0>
  constexpr Span(R&& source_range)
    : _data(detail::cast_smart_pointer_range_data_to_raw_pointer<T>(source_range))
    , _size(std::size(source_range))
  {
  }

  // ...

private:
  T* _data = nullptr;
  std::size_t _size = 0;
};
Taras
  • 488
  • 3
  • 15
  • [std::transform](https://en.cppreference.com/w/cpp/algorithm/transform) ? – Jesper Juhl Feb 03 '20 at 16:33
  • @JesperJuhl No :) A `Span` is basically just a pair of a raw pointer and a size. There is nowhere to apply `std::transform` in order to obtain a `Span` from a `vector`. Other vectors (and containers of different types which store their data contiguously) are simply converted to a Span by invoking their .data() member function -- yet it would not work without an explicit conversion in my particular case. – Taras Feb 03 '20 at 16:45
  • 1
    You have to allocate `std::vector` or similar. – Jarod42 Feb 03 '20 at 16:47
  • 1
    If `DoSomething` does not require contiguous memory, then a more general interface using iterators or ranges seems appropriate. That is, if you can change the method yourself. I have not worked with ranges (neither [ranges-v3](https://github.com/ericniebler/range-v3) nor [C++20 Ranges](https://en.cppreference.com/w/cpp/ranges)), but you could pass your `std::vector>` as if it were a range of raw pointers with a transform view range, optionally with a filter view. – Araeos Feb 05 '20 at 08:59

1 Answers1

1

Is such a cast possible with the current proposal of std::span without causing undefined behavior and relying on dirty hacks?

No. Even if this statement is true (and I know of no requirement in the standard that forces this to be true):

a std::unique_ptr<T> with a stateless deleter has the same size and alignment as a raw T* pointer, and all it stores inside is nothing but that raw pointer itself.

That doesn't matter. A unique_ptr<T> is not just a T* with some member functions bolted onto it. It's a unique_ptr<T>, and attempting to pretend that the one is the other is UB due to a violation of the strict-aliasing rule.

If it's not, could it be expected in the following standards (e.g., C++23)?

No. Even if a form of P0593 finds its way into the standard in a way that does allow for the bytes stored in an array of unique_ptr<T> to be transformed into an array of T*, this would be a transformation, not a cast. That is, the lifetime of those unique_ptr<T>s would end, and the lifetime of an array of T*s would begin using the data in the previously-ended object. So you couldn't use the vector<unique_ptr<T>> again after doing it.

Any such transformation, were it allowed, would be decidedly one-way. The ability of P0593 to implicitly create objects in bytes of storage is restricted to types which are essentially just bytes of data, and unique_ptr would not fit into that restriction.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • Thank you. But still, provided that I am 100% sure that on my platform a `std::unique_ptr` has the same binary representation as a raw pointer (I could write my own `UniquePtr` for that), what the real manifestations of that horrible undefined behavior could be? Can you (or someone else) provide any examples? – Taras Feb 03 '20 at 21:33
  • For instance, I can imagine an UB if my resulting `Span` object had the type `Span` (without `const`). That could easily lead to bad stuff. E.g., client code could call a function `void ModifyThosePointers(Span object_pointers)`, which actually changes the values of the argument pointers (it is completely allowed to do so, now when its args are not const). There are no "real" raw pointers (from C++ lifetimes' point of view), so writing into the memory which is occupied by "fake" raw pointers, and which actually belongs to a vector of `unique_ptr`s, could easily mess things up. – Taras Feb 03 '20 at 21:40
  • Wait, wait. The strict aliasing is there, yet the raw pointer is there too. `unique_ptr` is a standard layout object so is [pointer-interconvertible](https://en.cppreference.com/w/cpp/language/static_cast#pointer-interconvertible) with its only member. That is, using `reinterpret_cast` on a pointer to `unique_ptr` one can get a pointer to the underlying raw pointer. Voilà! Now double check the rules for pointer arithmetic... the raw pointers are tightly packed (due to standard layout) so they *are* there, so array access may be fine too, unsure. – numzero Nov 02 '20 at 18:11
  • According to cppreference such trick is forbidden for arrays actually: “if the pointed-to type is different from the array element type, <...>, the behavior of pointer arithmetic is undefined.” and subscription is defined in terms of pointer arithmetic. It is unclear would using `uintptr_t` instead be legal. – numzero Nov 02 '20 at 18:31
  • @numzero: "*unique_ptr is a standard layout object*" Is it? Where does it say that in the standard? I just did a check, and nowhere in the definition of `unique_ptr` does it say that it is standard-layout. – Nicol Bolas Nov 02 '20 at 19:20
  • That’s implementation defined; can be checked with `std::is_standard_layout`. The only concern that remains is that whether it actually stores a `T*` or maybe `std::uintptr_t`, for example; still, implementation defined but not UB yet. – numzero Nov 02 '20 at 19:27
  • @numzero: It is not "implementation defined" in the standard sense; it is "not defined". You can check it of course, but you cannot be certain that the `T*` is the first member subobject. – Nicol Bolas Nov 02 '20 at 23:52
  • @NicolBolas 23.11.1: “unique pointer is an object u that stores a pointer to a second object p...” so there have to be that `T*` (`uintptr_t` is not a pointer). Now, if it has standard layout *and* its size is equal to that of the underlying pointer, the latter *has* to be the only member, thus the first member (modulo zero-sized things ofc). P.S. C++ was presumably invented by common law lawyers... – numzero Nov 03 '20 at 14:55