3

I am working on a container who's subscript operator returns a wrapper reference, like this:

my_container::reference my_container::operator[](std::size_t i);

The internal element representation of the container cannot be directly referenced and thus needs the proxy object. The reference itself has deleted copy & move constructors, and all operations are only defined for rvalue references.

However, a problem arises when deducing the wrapper object via auto or decltype(auto). For example, in the following code:

my_container bar();
auto foo(std::size_t i)
{
    my_container c = bar();
    return c[i];
}

auto deduces to my_container::reference instead of my_container::value_type, since the reference proxy is an object type, and RVO guarantees that no copy- or move-constructors are being called. This is a problem, since in the above code a stack reference is being returned.

Is there any way to prevent such behavior? To my knowledge there is no way to bypass RVO, but perhaps I am missing something.

Edit: To give a bit more context, the container is a bit-mask over a vector of integers. Any assignments to the elements must also mask the assigned-to value, which is why the proxy reference is needed.

JustClaire
  • 451
  • 3
  • 11
  • RCO? Thats not a common abbreviation. Even with guessing I dont know what you mean – 463035818_is_not_an_ai Jan 21 '23 at 15:19
  • I see no way to prevent this at compile time. It's only possible to do so at runtime, with a huge pile of internal code. – Sam Varshavchik Jan 21 '23 at 15:19
  • 1
    The usual deduction, for example with `std::vector::operator[]` would be to deduce `my_container::value_type &` which still leaves a dangling reference. You need to explicitly return `my_container::value_type` – Richard Critten Jan 21 '23 at 15:20
  • @463035818_is_not_a_number, I think it's a mispelled RVO. A typo due to V and C being adjacent on the keyboard. – Enlico Jan 21 '23 at 15:21
  • @RichardCritten yes, i have misspelled RVO multiple times there :) probably need to go to sleep tbh – JustClaire Jan 21 '23 at 15:22
  • 1
    Your questions boils down to "how to block RVO" right? The premise is that a user can and will write code like your example and you want it to not compile, right? I think "bypass" is the wrong word – 463035818_is_not_an_ai Jan 21 '23 at 15:24
  • 2
    or are you looking for a way to make the code "ok" ? I think in that case more code ([mcve]) would be needed – 463035818_is_not_an_ai Jan 21 '23 at 15:25
  • @463035818_is_not_a_number yes, you're correct. RVO bypasses the deleted constructors and can thus cause a dangling reference. – JustClaire Jan 21 '23 at 15:26
  • 1
    I think I agree with Richard, return by value. Your caller might hang on to the reference for a period beyond the control of your container. If the value is removed from the container then you still have a dangling reference. Are you trying not to return copies because you have a known performance issue? – Pepijn Kramer Jan 21 '23 at 15:33
  • @PepijnKramer No, I am returning by reference to allow modification of contained elements. One other idea I have is to get into UB and `reinterpret_cast` the internal reference into a reference of a same-size proxy object. As in `reinterpret_cast(internal)` instead of `ref_proxy{internal}`. But, that is technically UB, which I would like to avoid. – JustClaire Jan 21 '23 at 15:39
  • 1
    @JustClaire In the code the `my_container` is local to `foo` so any reference or reference wrapper returned by `foo` would be dangling as `my_container` is destroyed when `foo` returns. It's just not possible to return a reference if `foo` controls the lifetime of the container. – Richard Critten Jan 21 '23 at 15:49
  • @RichardCritten yes, it should not be able to return the reference because constructors of the reference wrapper are deleted, but RVO bypasses the constructors and returns a dangling reference anyway. – JustClaire Jan 21 '23 at 15:56
  • When you say _"No, I am returning by reference to allow modification of contained element..."_ - this is not possible with `my_container` being destroyed when `foo` returns. – Richard Critten Jan 21 '23 at 16:08
  • @RichardCritten Yes, sorry, what I meant is that the operator is supposed to return by reference, `foo` isnt though, but because RVO doesn't respect deleted constructors it still returns a reference instead of being a compile-time error. – JustClaire Jan 21 '23 at 16:11
  • You can have workaround by keeping internal data as `std::shared_ptr` and keep `std::shared_ptr` in reference also, so if main container will be destroyed your data will be kept alive while there is live reference. – sklott Jan 21 '23 at 16:17
  • @sklott or keep a weak_ptr in the reference wrapper. If the remote object is deleted then it can no longer be referenced without error. – Pepijn Kramer Jan 21 '23 at 16:32
  • @PepijnKramer Yes, but in this case you will need either throw exception on invalid access or return some non-vlaue type, i.e. `std::optional` or `std::expected` or something similar. So, it depends on intended usage... – sklott Jan 21 '23 at 16:36
  • @JustClaire Are you trying to tell us your container lives on the stack (or the element in the container do) and could go out of scope? I don't see any real issue with that as long as the caller uses the reference locally (and they are not used on other threads, and other threads cannot modify the container). In all other cases I think I agree with sklott. Let your container and wrapper both share ownerhip of the data. – Pepijn Kramer Jan 21 '23 at 16:49
  • @PepijnKramer Yes, the container lives on the stack. It is essentially a bitmask over `__m128i` integer vector so using dynamic memory allocation is not really an option. – JustClaire Jan 21 '23 at 16:51
  • Ok so the client of the container must use it within the scope of the function. Which means taking references should not be a problem as long as they are only used within the same function. Which is I think fine and documentable (C++ itself doesn't prevent people from creating dangling references) you can't prevent all bugs. And trying to do so might cost you a lot of performance, which I think is a thing if you are using __m128i. Just my thoughts without really knowing what you are doing ;) – Pepijn Kramer Jan 21 '23 at 17:01
  • @PepijnKramer Yes, you are correct ;) Its just that generally, `auto` deduces to value types, and so people might `return c[i]`, and unintentionally return a dangling reference expecting that `auto` will make a copy of the element instead. – JustClaire Jan 21 '23 at 17:09
  • Did you know std::vector does something like this too? https://en.cppreference.com/w/cpp/container/vector_bool/reference. Maybe there is some wisdom in that source code. However I still think that dangling references cannot be prevented (I would never return a std::reference_wrapper), and that if you document this in your API it should be fine. (C++ API's are full of cases like this) – Pepijn Kramer Jan 21 '23 at 17:27
  • @PepijnKramer Technically *yes*. But i keep forgetting that `vector` even exists so thank you for reminding me :p – JustClaire Jan 21 '23 at 17:30

1 Answers1

3

C++ is not a safe language.

While clever and consistent use of certain language mechanisms can make the language safe-er, there are a number of construct in the language itself that are just not going to be safe without substantial language changes. The dangling reference problem being one of the biggest.

You cannot fix the dangling reference problem in its entirety by using in-language mechanisms. Any attempts to do so will run into two intractable problems:

  1. The solution will not be complete. There will be ways to get around it, and typically without being actively malicious (ie: someone can do them without explicitly trying to).
  2. The solution will make some legitimate uses of the type impossible.

The latter one is important. By making "all operations are only defined for rvalue references," you prevent something as simple as this from working:

for(auto &&ref : bool_vector)
  ref == true;

The user would have to type std::move(ref) == true;. And while your type would probably satisfy indirectly_writable by the letter of the definition, users are going to expect to be able to get a language reference to your proxy-reference and use it like an actual reference to a value. That is, they will expect *it = val; and auto &&ref = *it; ref = val; to be equally valid.

And they will be displeased when their legitimate code doesn't compile. So in attempting to provide some safety, you have made your type less usable by your users.

In any case, your particular problem is #1: you cannot provide this form of safety in the language. You cannot make a proxy iterator that provides this form of safety. Guaranteed elision is guaranteed, and any auto-returning function that does return prvalue_expression; will always work, regardless of any property of the type of prvalue_expression.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • Guess I'll have to break the strict aliasing rule and `reinterpret_cast` to a mask type instead of using a reference wrapper after all :') Then again, I already `reinterpret_cast` from `__m128i *` – JustClaire Jan 21 '23 at 16:49
  • 1
    @JustClaire: Why? Why not just use the reference type and expect your users not to do stupid things? Like, returning the result of `vector::operator[]` is just as broken. Use your reference type, but take out the attempts to protect users from themselves. – Nicol Bolas Jan 21 '23 at 16:53
  • `value_type` of the container is `bool`, the actual elements stored though, are 32-bit masks for the `__m128i` vector, and when the `bool` is assigned it must be extended to the full integer width (i.e. all `1`s instead of just the bottom bit). The wrapper essentially does `mask_ref = -static_cast(bool_value)` for assignment operators. – JustClaire Jan 21 '23 at 16:58