faster erase-remove idiom when I don't care about order and don't have duplicates?

Question

I have a vector of objects and want to delete by value. However the value only occurs once if at all, and I don't care about sorting.

Obviously, if such delete-by-values were extremely common, and/or the data set quite big, a vector wouldn't be the best data structure. But let's say I've determined that not to be the case.

To be clear, if my code were C, I'd be happy with the following:

void delete_by_value( int* const piArray, int& n, int iValue ) {
    for ( int i = 0; i < n; i++ ) {
        if ( piArray[ i ] == iValue ) {
            piArray[ i ] = piArray[ --n ];
            return;
        }
    }
}

It seems that the "modern idiom" approach using std::algos and container methods would be:

v.erase(std::remove(v.begin(), v.end(), iValue), v.end());

But that should be far slower since for a random existent element, it's n/2 moves and n compares. My version is 1 move and n/2 compares.

Surely there's a better way to do this in "the modern idiom" than erase-remove-idiom? And if not why not?

`std::find` finds that value for you, if it exists, then it's pretty much as it would happen in C. And it all comes down to, pretty much, identical to C. — Sam Varshavchik, Dec 30 '19 at 03:37
The two code samples are not equivalent. One uses an array, which can't be resized, while the `vector` version resizes. — PaulMcKenzie, Dec 30 '19 at 03:54
Many thanks; I deranted a bit; pass n by reference so the caller's view of the array IS in effect resized, and found better wording I think than "enlightened" — Swiss Frank, Dec 30 '19 at 03:54
Have you considered using a `std::list` instead of `std::vector`? Because `std::list` is a linked list you can remove any item within the list without having to do any copying. The disadvantage is access to elements is linear time. — jignatius, Dec 30 '19 at 04:16
@SwissFrank Where both answers use `erase()` or `resize()` to remove the last element it can be done more simply with a `pop_back()` — Blastfurnace, Dec 30 '19 at 04:37
`std::list` would make sense for big lists, but many lists maintained by software are small enough that `std::vector` is far more efficient. Let's assume my case is one of those. — Swiss Frank, Dec 30 '19 at 04:40
I am shocked still not seeing a good implementation for this here, after 2 years :/. — Carlo Wood, Mar 25 '22 at 11:51

score 2 · Answer 1 · answered Dec 30 '19 at 03:51

2

Use std::find to replace the loop. Take the replacement value from the predecessor of the end iterator, and also use that iterator to erase that element. As this iterator is to the last element, erase is cheap. Bonus: bool return for success checking and templateing over int.

template<typename T>
bool delete_by_value(std::vector<T> &v, T const &del) {
    auto final = v.end();
    auto found = std::find(v.begin(), final, del);
    if(found == final) return false;
    *found = *--final;
    v.erase(final);
    return true;
}

answered Dec 30 '19 at 03:51

HTNW

27,182
1
32
60

Thanks, looks like I could drop that into production code. Even caching final! :-D Still though, for such a common need, is there no more modern idiom than (basically) spelling out the C version (except your version's actually longer, albeit type-generic)? – Swiss Frank Dec 30 '19 at 04:05
@SwissFrank, the standard library gives you building blocks, and you use them to build what you need. If it were to provide all common algorithms, it would become an inconceivable monster. – Evg Dec 30 '19 at 07:00
1

You can use move assignment as you use `T`. – Jarod42 Dec 30 '19 at 07:27

score 2 · Answer 2 · answered Dec 30 '19 at 06:59

Surely there's a better way to do this in "the modern idiom" than erase-remove-idiom?

There aren't a ready-made function for every niche use case in the standard library. Unstable remove is one of the functions that is not provided. It has been proposed (p0041r0) a while back though. Likewise, there are also no special versions of algorithms for the special case of vectors that do not contain duplicates.

So, you'll need to implement the algorithm yourself if you wish to use an optimal algorithm. There is std::find for linear search. After that, you only need to assign from last element and finally pop it off.

parktomatomi · Answer 3 · 2019-12-30T03:56:50.227

0

Most implementations of std::vector::resize will not reallocate if you make the size of the vector smaller. So, the following will probably have similar performance to the C example.

void find_and_delete(std::vector<int>& v, int value) {
    auto it = std::find(v.begin(), v.end(), value);
    if (it != v.end()) {
        *it = v.back();
        v.resize(v.size() - 1);
    }
}

edited Dec 30 '19 at 03:56

answered Dec 30 '19 at 03:36

parktomatomi

3,851
1
14
18

Thanks but still, is there no modern idiom to do this extremely common operation? – Swiss Frank Dec 30 '19 at 03:59
normally, if you had a collection of unique values and didn't care about order, you'd use an `std::unordered_set` and not really care about the implementation. – parktomatomi Dec 30 '19 at 04:02
Sure, but I write software where performance is of interest, so I DO care about the implementation. There are a LOT of sets of data being tracked by software small enough that linear searches in a vector outperform set, unordered set, etc. – Swiss Frank Dec 30 '19 at 04:14
`` rarely provides super-performant solutions. It's there provide very generic, abstract solutions that work on multiple containers and are good enough for most situations. It's pretty normal to need to break out and write your own loops in bottleneck situations. – parktomatomi Dec 30 '19 at 04:43
Many, many many many MANY articles and books say DOES provide super-performant solutions. I'm starting to agree with you (as an old C++ programmer being dragged into the 21st century) that it is very case-by-case... – Swiss Frank Dec 30 '19 at 04:45

score 0 · Answer 4 · answered Dec 30 '19 at 07:30

C++ way would be mostly identical with std::vector:

template <typename T>
void delete_by_value(std::vector<T>& v, const T& value) {
    auto it = std::find(v.begin(), v.end(), value);

    if (it != v.end()) {
        *it = std::move(v.back());
        v.pop_back();
    }
}

faster erase-remove idiom when I don't care about order and don't have duplicates?

4 Answers4