8

Although there are tens of questions about remove_if + erase for vector. I couldn't find what is the performance of such action. When I write:

    myVector.erase(remove_if(myVector.begin(),
                         myVector.end(),
                         some_predicate), myVector.end());

The remove if will return an iterator to the last relevant item + 1 (let's call it X). I believe that this will happen in O(n).

But how the erase will work ?

  • If the erase will try to delete from the X to myVector.end() it will be O(n^2) because it will cause copying the vector to a new location, and there will be O(n) new allocations from the heap.
  • But if it will delete from myVector.end() to X it can do it in O(n) and more importantly will not allocate new memory (something I'm trying to avoid).

Thanks.

OopsUser
  • 4,642
  • 7
  • 46
  • 71
  • 3
    The complexity of `std::remove_if()` is *Exactly std::distance(first, last) applications of the predicate* as described [in the documentation](http://en.cppreference.com/w/cpp/algorithm/remove). –  Feb 08 '17 at 13:58
  • 1
    `vector::erase` doesn't cause any reallocation in this case. – PaulMcKenzie Feb 08 '17 at 14:00
  • 3
    *"If the erase will try to delete from the X to myVector.end() it will be O(n^2)"*.... How exactly? – Nawaz Feb 08 '17 at 14:01
  • @Nawaz Because when he deletes X. he deletes an item in the middle of the vector. causing realocation. Or he moves all the items after X backwards. both of the options takes O(n) time, the deletion will happen O(n) times. so O(n)*O(n) = O(n^2) – OopsUser Feb 08 '17 at 14:42
  • 3
    @OopsUser removing from the middle of a vector does not cause reallocation! Making a vector smaller does not require more memory! You don't move the elements being deleted backwards, because you're going to delete them. You move the ones that come _after_ those elements backwards, but that still doesn't reallocate, and when you do `v.erase(X, v.end())` there _aren't any_ elements afterwards, so you just destroy them. – Jonathan Wakely Feb 08 '17 at 14:56
  • 2
    @OopsUser: That is *not* how `vector.erase(x, v.end())` works. @Jonathan's answer explains beautifully how it works! – Nawaz Feb 08 '17 at 16:10

3 Answers3

26

Consider this vector:

|0|1|2|3|4|5|6|7|8|9|

We use remove_if to remove all elements that are multiples of 4:

std::remove_if(v.begin(), v.end(), [](auto i){ return i != 0 && !(i%4); });

This starts iterating through the vector with an iterator X until it finds an element where the predicate returns true:

|0|1|2|3|4|5|6|7|8|9|
         X

This is the first element we want to remove.

Next it creates another iterator pointing to the next element, Y = X+1, and checks the predicate for *Y:

|0|1|2|3|4|5|6|7|8|9|
         X Y

The predicate is false, so we want to keep that element, so it assigns the next element to the element we want to remove, by doing *X = std::move(*Y):

|0|1|2|3|5|5|6|7|8|9|
         X Y            *X = std::move(*Y)

So we have two iterators, X and Y, where X points to the next element in the "output" (i.e. the elements we're not removing) and Y is the next element to consider removing.

We move both iterators to the next position, check the predicate for Y (which is false again), and do another assignment:

|0|1|2|3|5|6|6|7|8|9|
           X Y          *X = std::move(*Y)

Then it does the same again at the next position:

|0|1|2|3|5|6|7|7|8|9|
             X Y       *X = std::move(*Y)

And then it moves on, but finds that the predicate is true for Y

|0|1|2|3|5|6|7|7|8|9|
               X Y

So it just increments Y, which skips that element and so doesn't copy it into the "output" position at X:

|0|1|2|3|5|6|7|7|8|9|
               X   Y 

The predicate is not true for Y, so it assigns it to X:

|0|1|2|3|5|6|7|9|8|9|
               X   Y     *X = std::move(*Y)

Then it increments X and Y again

|0|1|2|3|5|6|7|9|8|9|
                 X   Y

Now Y is past-the-end so we return X (which points past-the-end of the output sequence, i.e. the elements we want to keep).

After the remove_if returns X we call v.erase(X, v.end()), so the vector invokes the destructors for each element from X to the end:

|0|1|2|3|5|6|7|9|~|~|
                 X   end

And then sets the size so the vector ends at X:

|0|1|2|3|5|6|7|9|
                 end

After this v.capacity() >= v.size()+2 because the memory that was used by the two final elements is still present, but is not in use.

Jonathan Wakely
  • 166,810
  • 27
  • 341
  • 521
  • 3
    @RustyX No! You don't destroy old elements and construct new ones, you assign new values to existing elements. – Jonathan Wakely Feb 08 '17 at 15:00
  • 3
    @Caleth, no it won't! It's an assignment, it doesn't destroy anything. – Jonathan Wakely Feb 08 '17 at 15:01
  • Maybe you also want to show that the moved values become invalid, e.g. putting ~ in place of the moved value. I mean, running `*X = std::move(*Y)` on `|4|5|` would result in `|5|~|`, not `|5|5|`, right? – Agostino Feb 08 '17 at 16:28
  • How will `erase` behave if I'll do `v.erase(X, X+2)` ? It can't just change the size. `erase` can't assume he needs to delete till the end. – OopsUser Feb 08 '17 at 16:36
  • 2
    @OopsUser erase will move the elements from `X+2` to `end()` forward 2 places and then decrease the size. Essentially `for(;it2 != end(); ++it1, ++it2) {*it1 = std::move(*it2); } /*destruct elements from it1 to end */ size-=2;` – ratchet freak Feb 08 '17 at 17:08
  • 1
    @OopsUser see the output of this program: http://coliru.stacked-crooked.com/a/552e181a61d48033 (you move the elements you want to keep forward, then destroy the ones at the end). You never need to reallocate unless you are growing the vector, because you always have enough space to work with already. – Jonathan Wakely Feb 08 '17 at 17:27
  • @Agostino, no. Moving a value doesn't destroy it. Moving from an integer doesn't alter it at all. – Jonathan Wakely Feb 08 '17 at 17:31
  • Hot damn, illustrations ftw. – Barry Feb 08 '17 at 17:32
  • 1
    @Barry, check out the coliru link in the comment ^^^ then ;) I wrote that "visualgo" stuff for visualising sort algos. – Jonathan Wakely Feb 08 '17 at 17:34
  • "_Moving a value doesn't destroy it._" Sure (unless the class author deliberately programmed that, but it seems very wrong to do so). But @Agostino didn't mention destruction. And they were basicaly right in the statement that "_the moved values become invalid_". The removed elements left past the new `end` iterator have unspecified values. `cppreference` in fact unites the two by saying those objects "_have unspecified values (as per MoveAssignable post-condition)._" So, there is formally nothing that can be done with those elements, which effectively means they're invalid (if not *quite* UB) – underscore_d Oct 25 '18 at 18:14
8

The complexity of vector::erase is well-defined:

Linear: the number of calls to the destructor of T is the same as the number of elements erased, the assignment operator of T is called the number of times equal to the number of elements in the vector after the erased elements

The way it is going to work internally (i.e. how exactly is it going to remove your elements) is kind of irrelevant.

The complexity of remove_if is also defined and is

Exactly std::distance(first, last) applications of the predicate.

So your code has linear complexity.

SingerOfTheFall
  • 29,228
  • 8
  • 68
  • 105
  • Does it allocate memory ? – OopsUser Feb 08 '17 at 14:11
  • 2
    @OopsUser, no, neither of them allocates memory, why should they? – SingerOfTheFall Feb 08 '17 at 14:14
  • @RustyX, `remove_if` will swap every element for which the predicate returns `true` with some other element in the tail of the vector, so the number of moves should equal the number of objects that will be removed. As for `erase`, the quote in my answer says that. – SingerOfTheFall Feb 08 '17 at 14:18
  • @SingerOfTheFall how can he delete without allocating memory ? Let's say I have 10 items in the list and he needs to delete item number 5. How can he do it ? By moving all items backwards ? – OopsUser Feb 08 '17 at 14:29
  • 1
    @OopsUser, yes, remove_if will swap the elements in such a way that all the elements that need to be deleted end up in the tail of the vector. Then you just need chop-chop that tail off. This is why it returns an iterator: this iterator points to the last element that does _not_ need to be deleted. Everything after that iterator are the elements that need to be deleted. – SingerOfTheFall Feb 08 '17 at 14:30
  • @SingerOfTheFall you are talking about the remove_if, but now the erase got the iterator. how can he delete from the middle of the vector without allocating new one. A vector is an array. – OopsUser Feb 08 '17 at 14:40
  • 2
    @OopsUser you only need to allocate when the vector gets bigger. To erase elements at the end of a vector you just destroy them, and change the vector's `size()`. The `capacity()` stays the same. So `remove_if` moves unwanted elements to the end, then `erase()` destroys them and reduces the `size()` – Jonathan Wakely Feb 08 '17 at 14:42
  • 1
    @OopsUser, in your case erase doesn't delete from the middle of the array. When it gets it's iterators, all the elements it needs to remove are already in the end. Let's say you have an array `123456` and you need to remove all even numbers. After you run `remove_if`, your array looks like `135246`, and you have the iterator pointing to `2`. Then you pass that to `erase`, and it chops the end of the array (all elements from `2` to the end) – SingerOfTheFall Feb 08 '17 at 14:43
  • @Jonathan Wakely so you are saying the the erase is very smart and running backwards from 'end' to 'X' ? – OopsUser Feb 08 '17 at 14:43
  • 1
    @OopsUser, it doesn't need to (some implementations do that, some don't). If it's removing everything from X to the end it can just destroy them and reduce the size. – Jonathan Wakely Feb 08 '17 at 14:55
0

Why not use a swap 'n pop approach? We fooled around a lot with optimizing erase in vectors and found this to be the fastest, as it has O(1) complexity. Only downside is that it doesn't preserve order. Which is fine is a lot of cases. Here is the template method for such an operation:

template<typename T>
inline typename std::vector<T>::iterator unorderedErase(std::vector<T>& p_container,
                                                        typename std::vector<T>::iterator p_it)
{
    if (p_it != p_container.end() - 1)
    {
        std::swap(*p_it, p_container.back());
        p_container.pop_back();
        return p_it;
    }
    // else
    p_container.pop_back();
    return p_container.end();
}
Mar Tijn
  • 173
  • 7
  • 1) this removes 1 element, 2) remove_if works exactly like that. – SingerOfTheFall Feb 08 '17 at 14:12
  • `remove_if` has complexity `O(n)` (it traverses the container and applies the predicate `n` times), whereas this `unorderedErase` example has complexity `O(1)`, as there is no traversing. – Mar Tijn Feb 08 '17 at 14:21
  • 1
    awesome, and how are you going to obtain the iterator to the element that needs to be deleted without traversing the container? – SingerOfTheFall Feb 08 '17 at 14:23
  • That is a different question :) But if you obtain that, probably, through iteration, it **still** is quicker, since the total complexity then is `O(n)` vs worst case `O(n^2)`, . We had a case with a big update for-loop iterating over 10k particles using a normal erase. It actually caused noticeable frame drops on the PS4, since in worst case the erase caused an extra ~10k iterations for each element that needed to be erased. As soon as we replace it with this `unorderedErase`, all was fine and dandy – Mar Tijn Feb 08 '17 at 14:31
  • 1
    The total complexity of `remove_if`+`erase` is `O(N)` (up to the constant), _not_ `O(n^2)`. Of course, if you can obtain an iterator faster than by linear search (e.g. if your vector is pre-sorted) you will get better performance. – SingerOfTheFall Feb 08 '17 at 14:36
  • I was referring to my own example in which, in some outer loop I was already iterating through the container. In that case it is better to use the `unorderedErase` as opposed to the `remove_if+erase`. So it all depends on the use-case I guess – Mar Tijn Feb 08 '17 at 14:38
  • 2
    @SingerOfTheFall `remove_if` doesn't work like this because it has to preserve order of the elements that aren't removed, so it doesn't just do one swap. – Jonathan Wakely Feb 08 '17 at 14:39
  • 1
    and moving elements to preserve order cost some serious CPU cycles in large containers. We got tons of traces ending up in memmove calls, just for the erase. So if order doesn't need to be preserved, swapping like this really is the fastest way. – Mar Tijn Feb 08 '17 at 14:45
  • 1
    @JonathanWakely, yes indeed, but that (not caring about the order) will only decrease the constant, the overall complexity will still stay the same, provided you are searching the elements by linear search, or am I wrong? Of course that alone _might_ be enough to increase performance noticeably. – SingerOfTheFall Feb 08 '17 at 14:50
  • 2
    `std::move`, or something equivalent, is called. This will move all elements after the erase, in order to preserve order. Worst case, erase the first element in a huge container and you'll see what happens :) – Mar Tijn Feb 08 '17 at 14:58
  • The question was specifically about `std::vector::remove_if()` and `erase()`, not any alternative. This does not answer that. – underscore_d Oct 25 '18 at 18:19