Why is it unsafe to call push_back in a for loop even when there is no memory reallocation?

Question

When I read this post: https://stackoverflow.com/a/42448319/3336423

I understand, that calling push_back in a for loop is unsafe because:

If the new size() is greater than capacity() then all iterators and references (including the past-the-end iterator) are invalidated. Otherwise only the past-the-end iterator is invalidated.

Then, I assume that if I guarantee the capacity won't be exceeded, it would be safe....

But apparently not, all implementations below will crash after first push_back is called (using Visual Studio), even if I can see that capacity remains unchanged within the loop (so I assume the vector does not reallocate its memory):

Version1:

std::vector<int> v1{ 3, 4, 5 };
v1.reserve( v1.size()*2 );
size_t c1 = v1.capacity();
for ( auto val : v1 )
{
    v1.push_back( val );
    c1 = v1.capacity();
}

Version2:

std::vector<int> v2{ 3, 4, 5 };
v2.reserve( v2.size()*2 );
size_t c2 = v2.capacity();
auto curEnd = v2.end();
for ( auto iter = v2.begin(); iter != curEnd; ++iter )
{
    v2.push_back( *iter );
    c2 = v2.capacity();
}

Version3:

std::vector<int> v3{ 3, 4, 5 };
v3.reserve( v3.size()*2 );
size_t c3 = v3.capacity();
for ( auto iter = v3.begin(); iter != v3.end(); ++iter )
{
    v3.push_back( *iter );
    c3 = v3.capacity();
}

What makes those code crash?

"Otherwise only the past-the-end iterator is invalidated" - your quote. — Rakete1111, Jan 30 '18 at 10:21
You answered yourself. Past-the-end iterator is invalidated in all cases and you use it when checking loop-end condition. — Daniel Langr, Jan 30 '18 at 10:22
@Rakete1111: Agree for "Version1", but in "Vesrion2", I saved it outside the loop, so `curEnd` should still be comparible to `iter`...no? — jpo38, Jan 30 '18 at 10:27
@jpo38 - When they say all iterators *to* something, they mean all copies. `curEnd` is no longer valid as well. — StoryTeller - Unslander Monica, Jan 30 '18 at 10:29
@jpo38 Your two examples are conceptually equivalent. No, as the quote says, `curEnd` is invalidated. — Rakete1111, Jan 30 '18 at 10:32
If however you were calling end() every time, then invalidating the end iterator would be fine. — UKMonkey, Jan 30 '18 at 10:44
v2.end() represents some special iterator marking an element out of valid range. Once that range is extended the old end iterator will be seen as garbage. — jszpilewski, Jan 30 '18 at 10:44
@UKMonkey: calling v2.end() withing the for statement crashs too, that's why I made a copy hoping it would work better, but it does not. — jpo38, Jan 30 '18 at 10:46
@UKMonkey - Except then it will go over the freshly inserted elements too, and will reallocate, and you'll get UB as well. — StoryTeller - Unslander Monica, Jan 30 '18 at 10:47
@StoryTeller "Except then it will go over the freshly inserted elements too" that's fine, and it's defined behavior. "and will reallocate" only if the reserved size wasn't sufficient. It's not something I'd ever suggest doing, to anyone (because it is a little risky with the test on the reserved size); but that doesn't mean it's not legal and well defined. — UKMonkey, Jan 30 '18 at 10:56
_ Then, I assume that if I guarantee the capacity won't be exceeded, it would be safe...._ You'll merely need to allocate infinite capacity, then it will be safe. — Eljay, Jan 30 '18 at 12:54

StoryTeller - Unslander Monica · Accepted Answer · 2018-01-30T11:08:25.717

6

Both your first two versions have the same problem. The cached past-the-end iterator is invalidated after the first insertion, making any subsequent use of it UB. Yes, even just comparing against it.

Your third sample crashes because it ends up trying to re-insert the freshly inserted elements. So it ends up needing a reallocation, which ends up causing more UB.

The correct iterator-based approach is to not go until one past the end. But until the end, assuming the vector is not empty.

std::vector<int> v{ 3, 4, 5 };
v.reserve( v.size()*2 );

auto it = v.begin(), end = v.end();
--end;
do {
  v.push_back(*it);
} while (it++ != end); // Check against current position, but increment still

edited Jan 30 '18 at 11:08

answered Jan 30 '18 at 10:51

StoryTeller - Unslander Monica

165,132
21
377
458

Would be nice to tell what's the good approach (use an index based loop!!?) – jpo38 Jan 30 '18 at 10:55
@jpo38 - I was just writing it. Sorry for the delay. – StoryTeller - Unslander Monica Jan 30 '18 at 10:57
Accepted because I liked the "until the end" suggestion. But other answers are good too! – jpo38 Jan 30 '18 at 11:03
@jpo38 - I was trying to demonstrate how an approach that still uses iterators will look like. Personally I'd go with one of the other answers. – StoryTeller - Unslander Monica Jan 30 '18 at 11:05
This is a nice alternative to index based loop, that was the easiest alternative I considered whet I hit this problem. – jpo38 Jan 30 '18 at 11:07

eerorika · Answer 2 · 2018-01-30T11:08:39.230

Same quote with my emphasis:

If the new size() is greater than capacity() then all iterators and references (including the past-the-end iterator) are invalidated. Otherwise only the past-the-end iterator is invalidated.

curEnd of Version2 is past-the-end iterator and is therefore invalidated. Version1 does something similar under the syntactic sugar.

Version3 will continue iterating the newly inserted elements until reallocation happens. Even if there was no reallocation, it would loop indefinitely until all memory is exhausted.

A simple solution to modify Version3 by storing the old size (count), and loop until begin + count

std::vector<int> v3{ 3, 4, 5 };
auto count = v3.size();
v3.reserve( v3.size()*2 );
size_t c3 = v3.capacity();
for ( auto iter = v3.begin(); iter != v3.begin() + count; ++iter )
{
    v3.push_back( *iter );
    c3 = v3.capacity();
}

Another is to use the old fashioned indexed loop.

Would be nice to tell what's the good approach (use an index based loop!!?) — jpo38, Jan 30 '18 at 10:55

Rakete1111 · Answer 3 · 2018-01-30T10:58:32.543

If the new size() is greater than capacity() then all iterators and references (including the past-the-end iterator) are invalidated. Otherwise only the past-the-end iterator is invalidated.

In both cases, the end iterator is stored in a variable only once, and so as soon as you add an element, that iterator is invalid and using it is undefined behavior.

You need to get the new end iterator in each loop iteration, as you did in Version 3. But that one crashes too!

For the exact same reason. Your loop is actually infinite and will never stop allocating memory for objects. Because the capacity is not infinite, you will get a reallocation and then you have undefined behavior.

You are always adding one element to the vector, and since the loop stops when it went through each element, it never stops. It's a bit like

int end = 1;
for (int i = 0; i < end; ++i) // infinite!
  ++end;

You can always use an index based loop instead if you want to add the three elements again. Or more even, given that you reserve enough storage. :)

Would be nice to tell what's the good approach (use an index based loop!!?) — jpo38, Jan 30 '18 at 10:55

Why is it unsafe to call push_back in a for loop even when there is no memory reallocation?

3 Answers3

Linked

Related