47

When the function involves reallocation, I found some compilers may save the address before the function call. It leads the return value stored in the invalid address.

There is an example to explain behavior in above description.

#include <stdio.h>
#include <vector> 
using namespace std;

vector<int> A; 
int func() { 
    A.push_back(3);
    A.push_back(4);
    return 5; 
} 
int main() { 
    A.reserve(2);
    A.push_back(0);
    A.push_back(1);
    A[1] = func();
    printf("%d\n", A[1]);
    return 0;
}

There are some common C++ compiler, and the test result as follows.

  • GCC(GNU Compiler Collection): Runtime Error or output 1
  • Clang: output 5
  • VC++: output 5

Is it undefined behavior?

Alan Birtles
  • 32,622
  • 4
  • 31
  • 60
Morris Yang
  • 482
  • 3
  • 10
  • See the description of the assignment operator at least in the C++ 2014 Standard. – Vlad from Moscow Mar 02 '18 at 15:27
  • 4
    _"When the function involves relocation, I found some compilers may save the address before the function call"_ - none of those things mean what you seem to think they mean. This isn't anything to do with dynamic linking. "When the function involves _reallocation_" would be accurate. – Useless Mar 02 '18 at 15:31
  • 3
    Good point. Reallocation somehow implies relocation (create new space, move stuff over), but using reallocation would be clearer here, if that is what the original author really meant. – Ulrich Eckhardt Mar 02 '18 at 15:34
  • 1
    @Useless what he meant was obvious enough, and nothing to do with linking. He means if the vector contents is relocated (reallocation could in theory leave the data unmoved, were a library to use `realloc` behind the scenes), and that the compiler may have kept the address of the data in a register around the calls to `push_back`. I've encountered the bug^Wissue he's talking about before now, so you could claim I'm biased, but ... – Will Crawford Mar 03 '18 at 03:17
  • I know what OP meant after re-reading, but there's no harm telling people the correct names for things (or what the words they used actually mean). It's all useful information. – Useless Mar 04 '18 at 12:19

2 Answers2

54

The behaviour is undefined in all C++ versions before C++17. The simple reason is that the two sides of the assignment operator can be evaluated in any order:

  • Assuming A[1] is evaluated first, you get an int& referring to the second element of A at that point.
  • Then, the func() is evaluated, which can reallocate the storage for the vector, leaving the previously retrieved int& a dangling reference.
  • Finally, the assignment is performed, writing to unallocated storage. Since the standard allocators cache memory, the OS often won't catch this error.

Only in C++17, the special rule 20 for the assignment was made:

In every simple assignment expression E1=E2 and every compound assignment expression E1@=E2, every value computation and side-effect of E2 is sequenced before every value computation and side effect of E1

With C++17, A[1] must be evaluated after the call to func(), which then provides defined, reliable behaviour.

Ulrich Eckhardt
  • 16,572
  • 3
  • 28
  • 55
  • 5
    Binaries made with GCC do not crash if `-std=c++17` is used. Maybe the wording in C++17 has changed? – HolyBlackCat Mar 02 '18 at 15:30
  • 6
    @HolyBlackCat It did. operators have to respect associativity – NathanOliver Mar 02 '18 at 15:32
  • 1
    I think it should say "which can reallocate the storage for the vector" instead – Slava Mar 02 '18 at 15:34
  • @Slava, I wrote the above under the assumption that `reserve()` guarantees the exact size, but it seems to guarantee a minimal size instead. I remembered that part wrongly. – Ulrich Eckhardt Mar 02 '18 at 15:38
  • 4
    Maybe C++17 changed some rules about evaluation order, which can be found in http://en.cppreference.com/w/cpp/language/eval_order – Morris Yang Mar 02 '18 at 15:46
  • 4
    As per item 20 on the cppreference page @MorrisYang linked, the code's behavior is not undefined in C++17 so if you could edit your answer to show that it would be complete. – patatahooligan Mar 02 '18 at 16:28
-5

If you check the documentation, under "Iterator Invalidation", you'll see that push_back() may invalidate every iterator if it changes capacity, since it would have to reallocate memory. Remember that, for an std::vector, a pointer is a valid iterator as well. Because push_back() may or may not reallocate, and you have no way of knowing if it will, the behavior is undefined.

  • 4
    Pointers (or references) and iterators are separate, with separate rules. If you look at other containers you'll see cases where one will invalidate but not the other. – Mark Ransom Mar 02 '18 at 16:05
  • Sure, but this is not one of those cases. A pointer to T satisfies all the requirements of vector::iterator. – Mikaela Szekely Mar 02 '18 at 16:13
  • Just because a pointer *can be* an iterator, doesn't make it an iterator. In the case of `std::vector` iterators and references are invalidated at the same time, but that doesn't mean they're the same. This is the sort of answer that can confuse people. – Mark Ransom Mar 02 '18 at 16:35
  • If I edited my answer to say specifically that "for an `std::vector` a pointer is a valid iterator", would that be satisfactory? – Mikaela Szekely Mar 02 '18 at 17:24
  • http://www.cplusplus.com/reference/vector/vector/push_back/ is more explicit: *If a reallocation happens, all iterators, pointers and references related to the container are invalidated.* – Tristan Mar 02 '18 at 17:39
  • No, because *it's not an iterator*. End of discussion. – Mark Ransom Mar 02 '18 at 18:23
  • Practical workaround: `reserve()` space in your `vector`. – Davislor Mar 02 '18 at 19:48
  • @MarkRansom from n4284, "Note: For example, the type "pointer to int" is a contiguous iterator" ... – Will Crawford Mar 03 '18 at 04:32
  • 1
    @WillCrawford I'm not denying that a pointer is a perfectly valid iterator implementation, and some early versions of `vector` worked that way. But try comparing a pointer to `v.end()` and tell me how that works out. – Mark Ransom Mar 03 '18 at 04:40
  • Ha! No, fair enough … I’d still argue that some parts of the standard give the impression that T* "ought" to work as an iterator (e.g. `template constexpr T* begin(T (&array)[N]) noexcept;` … Returns: `array`) – Will Crawford Mar 03 '18 at 04:49