4

Given the following code, where x is a dangling const reference to a vanished object, and is therefore undefined behavior.

auto get_vec() { return std::vector<int>{1,2,3,4,5}; }
const auto& x = get_vec().back();

It seems like neither GCC 7.3, Clang 6.0 and MSVC is able to emit a warning, even with all warnings enabled. Does anyone know if it is any way to emit a warning in these cases? Is there any difference between const auto& and auto&& in these cases?

Note, if back() would return by value, it wouldn't be undefined behavior as the lifetime temporary object x is extended to function scoop.

Long story: I have a code base where const auto& is used as the default way of initializing variables, and for some odd reason these cases executes correctly using MSVC, but when compiled with Clang for android, every occurance results in a wrongly assigned value. For now the solution seems to investigate every const auto& in the whole code base. Also, in many cases the const auto& refers to a heavy object returned by reference so simply removing the & is not a solution.

One more thing, I'm responsible for the miss use of const auto& :)

Viktor Sehr
  • 12,825
  • 5
  • 58
  • 90
  • " For now the solution seems to investigate every const auto& in the whole code base" - or use values, which in this case seems like the obvious, and possibly more efficient, thing to do. –  Mar 21 '18 at 00:34
  • 1
    How would you realize by yourself if this line of code has dangling reference (only knowing function signature)? – Slava Mar 21 '18 at 00:36
  • @NeilButterworth: In many cases the const auto& returns a heavy object, and therefore it is not as simple as just remove the & – Viktor Sehr Mar 21 '18 at 00:50
  • @Victor But not in the case you exampled. Choosing the correct type for a particular situation is a basic programming skill - one size does not fit all. –  Mar 21 '18 at 00:51
  • Ouch, that kind of code is going to leave a mark. I've made some code that also does not execute correctly using MSVC (VS2017 update 6). MSVC also does not emit a warning or error. +1 for finding some pathological code! – Eljay Mar 21 '18 at 00:54
  • A compiler that could do that for any program could solve the halting problem. And that ain gonna happen. – Jive Dadson Mar 21 '18 at 01:01
  • 1
    @ViktorSehr since C++17 it is guaranteed that the "heavy object" returned by value will not have any extraneous copy/moves . Your solution was viable prior to C++17 if you had to deal with compilers that didn't perform copy elision, but going forward you should be able to go back to that. – M.M Mar 21 '18 at 01:07
  • @M.M Clarified. In these cases the objects are real references. – Viktor Sehr Mar 21 '18 at 01:32
  • @NeilButterworth This is a simplified example. The const auto& fits both references and values and is "almost" a one size fits all, as long as it is not a reference into an object returned by value. And this is where I'm looking for help by the compiler. – Viktor Sehr Mar 21 '18 at 01:34
  • Maybe you could adopt a naming convention for functions that return by reference – M.M Mar 21 '18 at 01:36
  • @Eljay In our code base, I think I also have some UB that cause our program to works incorrectly after upgrading the compiler. In our case, the problem was related to modifying the same object more than once in an expression. – Phil1970 Mar 21 '18 at 01:45
  • @ViktorSehr As explained in my answer, you can mostly avoid the problem by splitting the expression in multiple statements (or having a function that returns the final value/reference as appropriate). – Phil1970 Mar 21 '18 at 01:52
  • @M.M I would avoid such naming convention. In fact, I would not encourage the usage of `const auto &` when in reality an object is returned by value as it make it less clear in the code that we got a copy. – Phil1970 Mar 21 '18 at 01:59
  • @Phil1970 I am suggesting using `const auto&` only when the function returns by reference, and to help out with this it would be useful if you could tell from the function name that it returns by reference – M.M Mar 21 '18 at 02:02
  • @Victor I still don't see why you have to have a " default way of initializing variables" - your posts have not made this clear, to me at least. It smacks of architecture astronaut syndrome. –  Mar 21 '18 at 02:03
  • @NeilButterworth As const auto& binds to both references and temporaries (extending the lifetime), it is a way of indicating "give me the result, and only copy if necessary", and this should be default (see RustLang for example). However, as the question indicates, there are rare cases where an object returned by value in itself returns a reference. – Viktor Sehr Mar 21 '18 at 03:52
  • @Phil1970 • ooops, I should have been more clear. I was using the OP's code with MSVC but using an `Int` class which wrapped an int instead of `int`, and since `back()` returns a reference, the const& binds to something that is owned by the temporary and is about to go away. UB or not regardless, it is just plain bad code and bad coding practice -- and the compilers do not emit a warning. I wonder if Coverity would detect the problem? – Eljay Mar 21 '18 at 11:39

4 Answers4

4

Almost certainly there is no way of warning on this. The compiler has no idea whether the referenced object returned by back() will outlive the line or not, and if it does, there's no problem (though I'd be hard pressed to think of a realistic situation where a non-static member function called on a temporary object returns a reference to an object which outlives the temporary object).

It sounds like whoever wrote that code read about the most important const, and took away entirely the wrong lesson from it.

Sneftel
  • 40,271
  • 12
  • 71
  • 104
  • Just to be honest, I'm the responsible one :) – Viktor Sehr Mar 21 '18 at 00:50
  • Could you please elaborate? Most of all why does the `const &` does not work as the OP expected, i.e. that the lifetime of the returned temporary is extended to the lifetime of the const reference to it? – Daniel K. Jan 04 '22 at 15:33
  • @DanielK. `back()` returns a reference to an element, not a temporary value. For this to be safe, the vector's lifetime would need to be extended; but there's no `vector const&` in the code to do that. – Sneftel Jan 04 '22 at 15:36
  • Thanks for your prompt reply. Indeed, [clang V10 issues a warning.](https://godbolt.org/z/c3nT3Mvc5): "object backing the pointer will be destroyed ...". gcc doesn't. I understand that `back()` returns a reference to an element, however, isn't this element a member of a temporary (vector) and as such the element could be considered temporary itself? – Daniel K. Jan 04 '22 at 16:07
  • 1
    @DanielK. "Temporary" is a [specific concept in the C++ standard](https://en.cppreference.com/w/cpp/language/lifetime#Temporary_object_lifetime), one that does not arise when binding a reference to a returned reference. The compiler doesn't have the ability to round up all the objects which need to stay alive to prop up a particular reference, and extending the lifetime of every temporary mentioned in a line of code where a `const&` is initialized would be massive overkill (yet still be potentially unsafe). (1/) – Sneftel Jan 04 '22 at 16:18
  • @DanielK. Consider: suppose `get_vec` returned a `vector`, and you as the programmer knew that all the elements were pointers to global objects. Would you want `const auto& i = *get_vec().back()` to automatically keep the vector alive, even though it's no longer needed? And how is the compiler to know the difference? – Sneftel Jan 04 '22 at 16:19
  • Understood. C++ is a complex and elusive language. Just when you thought that there's a C++ feature that provides some ease of mind (e.g., lifetime extension of a temporary) it turns into a trap when used improperly. Thanks. – Daniel K. Jan 04 '22 at 17:02
4

Only thing I can come up with right now is to use CLANG with -fsanitize=address. But of course this will only help at runtime, but then you get something nice like this:

==102554==ERROR: AddressSanitizer: heap-use-after-free on address 0x603000000020 at pc 0x00000050db71 bp 0x7ffdd3a5b770 sp 0x7ffdd3a5b768
READ of size 4 at 0x603000000020 thread T0
    #0 0x50db70 in main (/home/user/testDang+0x50db70)
    #1 0x1470fb404889 in __libc_start_main (/lib64/libc.so.6+0x20889)
    #2 0x41a019 in _start (/home/user/testDang+0x41a019)

0x603000000020 is located 16 bytes inside of 20-byte region [0x603000000010,0x603000000024)
freed by thread T0 here:
    #0 0x50a290 in operator delete(void*) (/home/user/testDang+0x50a290)
    #1 0x50eccf in __gnu_cxx::new_allocator<int>::deallocate(int*, unsigned long) (/home/user/testDang+0x50eccf)
    #2 0x50ec9f in std::allocator_traits<std::allocator<int> >::deallocate(std::allocator<int>&, int*, unsigned long) (/home/user/testDang+0x50ec9f)
    #3 0x50ec2a in std::_Vector_base<int, std::allocator<int> >::_M_deallocate(int*, unsigned long) (/home/user/testDang+0x50ec2a)
    #4 0x50e577 in std::_Vector_base<int, std::allocator<int> >::~_Vector_base() (/home/user/testDang+0x50e577)
    #5 0x50e210 in std::vector<int, std::allocator<int> >::~vector() (/home/user/testDang+0x50e210)
    #6 0x50db16 in main (/home/user/testDang+0x50db16)
    #7 0x1470fb404889 in __libc_start_main (/lib64/libc.so.6+0x20889)

previously allocated by thread T0 here:
    #0 0x509590 in operator new(unsigned long) (/home/user/testDang+0x509590)
    #1 0x50e9ab in __gnu_cxx::new_allocator<int>::allocate(unsigned long, void const*) (/home/user/testDang+0x50e9ab)
    #2 0x50e94b in std::allocator_traits<std::allocator<int> >::allocate(std::allocator<int>&, unsigned long) (/home/user/testDang+0x50e94b)
    #3 0x50e872 in std::_Vector_base<int, std::allocator<int> >::_M_allocate(unsigned long) (/home/user/testDang+0x50e872)
    #4 0x50e2ff in void std::vector<int, std::allocator<int> >::_M_range_initialize<int const*>(int const*, int const*, std::forward_iterator_tag) (/home/user/testDang+0x50e2ff)
    #5 0x50deb7 in std::vector<int, std::allocator<int> >::vector(std::initializer_list<int>, std::allocator<int> const&) (/home/user/testDang+0x50deb7)
    #6 0x50dafb in main (/home/user/testDang+0x50dafb)
    #7 0x1470fb404889 in __libc_start_main (/lib64/libc.so.6+0x20889)

SUMMARY: AddressSanitizer: heap-use-after-free (/home/user/testDang+0x50db70) in main
Shadow bytes around the buggy address:
  0x0c067fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c067fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c067fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c067fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c067fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c067fff8000: fa fa fd fd[fd]fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb

Maybe you have automated unit tests you can easily run as "sanizizer" builds.

SaschaZorn
  • 142
  • 1
  • 2
2

I have a code base where const auto& is used as the default way of initializing variables

Ouch. :(

for some odd reason these cases executes correctly using MSVC, but when compiled with Clang for android, every occurance results in a wrongly assigned value

UB is UB innit.

For now the solution seems to investigate every const auto& in the whole code base

Yes.

Just as you cannot tell at a glance whether a particular case is "safe"/correct, the compiler cannot tell simply from a function signature.

If it always had access to the full definition of every function, it would be able to warn you in some cases (and analysis tools like -fsanitize=address will do their best with this), but there is no general-case solution for the compiler to detect dangling references at runtime.

Also congratulations on the payrise you can receive now that the guilty employees (the author and the reviewer) have been fired, right? :)

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
0

Obviously, for the above example, one would write something like:

std::vector<int> xx{1,2,3,4,5};
const auto& x = xx.back();

It does not make much sense to create a whole vector to keep only its last element. And if you have an expression like the above one and want to use a single expression, then you should almost never uses auto & to start with.

It the object is large, then you should either use move semantic or reference counting. So maybe you would have a function like GetLastValue that would returns by value a copy of the last vector value and then move that into the target destination.

You really need to understand what you are doing. Otherwise, you should use a language like C# where you need less knowledge about the internal working of the compiler or the exact language specifications.

As a general rule, I would say that you should not use auto & unless you are sure that you want a reference to the returned item. The most common case when I would use auto & or const auto & would be for a range based loop. For example, with the above vector named xx, I would generally write:

for (auto & item : xx) …

except if I know that it returns trivial types.

Phil1970
  • 2,605
  • 2
  • 14
  • 15