Generic Function remove() vs Member Function remove() for linked list

Question

I'm reading book "The C++ STL. A Tutorial and References" written by Nicolai M.Josuttis and in one of the chapters devoted to STL algorithms author states the following: If you call remove() for elements of a list, the algorithm doesn’t know that it is operating on a list and thus does what it does for any container: reorder the elements by changing their values. If, for example, the algorithm removes the first element, all the following elements are assigned to their previous elements. This behavior contradicts the main advantage of lists: the ability to insert, move, and remove elements by modifying the links instead of the values.To avoid bad performance, lists provide special member functions for all manipulating algorithms. You should always prefer them. Furthermore, these member functions really remove “removed” elements, as this example shows:

#include <list>
#include <algorithm>
using namespace std;
int main()
{
list<int> coll;
// insert elements from 6 to 1 and 1 to 6
for (int i=1; i<=6; ++i) {
coll.push_front(i);
coll.push_back(i);
}
// remove all elements with value 3 (poor performance)
coll.erase (remove(coll.begin(),coll.end(),
3),
coll.end());
// remove all elements with value 4 (good performance)
coll.remove (4);
}

Of course this seems enough convincing for further considerations but anyway, I decided to see the result running similar code in my PC, particularly in MSVC 2013 Environment. Here is my code improvised:

int main()
{
    srand(time(nullptr));
    list<int>my_list1;
    list<int>my_list2;
    int x = 2000 * 2000;

    for (auto i = 0; i < x; ++i)
    {
        auto random = rand() % 10;
        my_list1.push_back(random);
        my_list1.push_front(random);
    }

    list<int>my_list2(my_list1);

    auto started1 = std::chrono::high_resolution_clock::now();
    my_list1.remove(5);
    auto done1 = std::chrono::high_resolution_clock::now();
    cout << "Execution time while using member function remove: " << chrono::duration_cast<chrono::milliseconds>(done1 - started1).count();

    cout << endl << endl;

    auto started2 = std::chrono::high_resolution_clock::now();
    my_list2.erase(remove(my_list2.begin(), my_list2.end(),5), my_list2.end());
    auto done2 = std::chrono::high_resolution_clock::now();
    cout << "Execution time while using generic algorithm remove: " << chrono::duration_cast<chrono::milliseconds>(done2 - started2).count();

    cout << endl << endl;
}

I was surprised when saw the following output:

Execution time while using member function remove: 10773

Execution time while using generic algorithm remove: 7459

Could you please explain what can be the reason of such contradicting behavior?

Performance is probably not the most appropriate reason in this scenario; a) because it's probably wrong and b) because the performance of `std::list` is notoriously poor anyway (depending on the usecase of course). What sounds more relevant is that one approach invalidates pointers and references to the elements you don't remove, while the other doesn't. And after all, such guarantees are usually the reason for using a linked list to begin with. — Baum mit Augen, Oct 31 '18 at 13:12
`int` are very small and easy to copy. The difference between the two methods might be different if copying the values were more costly than just delinking nodes. — Blastfurnace, Oct 31 '18 at 13:21
@Blastfurnace The values usually don't get copied though, only swapped. That's commonly very cheap too. Especially compared to the list traversal. — Baum mit Augen, Oct 31 '18 at 13:23
Did you measure the performance in a debug build or an optimized one? — Max Langhof, Oct 31 '18 at 13:25
@BaummitAugen I don't think `std::remove` swaps the values (they might be moved). — Blastfurnace, Oct 31 '18 at 13:25
@Blastfurnace True, it only move assigns. Still, the point stands. — Baum mit Augen, Oct 31 '18 at 13:27
@BaummitAugen The member function is faster when the type is expensive to copy: http://coliru.stacked-crooked.com/a/efcf94fde4887777 — Blastfurnace, Oct 31 '18 at 13:44
@Blastfurnace Sure. As I said, that's *commonly* cheap. If, for whatever reason, you have a list of those big objects, different factors can certainly become relevant or even dominant in the individual case. Note how the original example does use `int` though, and still applied its performance claim. — Baum mit Augen, Oct 31 '18 at 13:48

Gardener · Answer 1 · 2018-10-31T13:52:49.133

This is a caching issue. Most performance issues are caching issues. We always want to think that the algorithm is the first thing to look at. However, if you purposely force the compiler to use memory from different locations in one run and memory all from the next location on the next run, you will see a caching issue.

By commenting out the push_back or the push_front when building the original list, I forced the compiler to create code to build a list with contiguous memory elements in my_list1.

my_list2 is always in contiguous memory because it is allocated in a single copy.

Run output:

Execution time while using member function remove: 121

Execution time while using generic algorithm remove: 125


Process finished with exit code 0

Here is my code with one of the push's commented out.

#include <list>
#include <algorithm>
#include <chrono>
#include <iostream>

using namespace std;

int main()
{
    srand(time(nullptr));
    list<int>my_list1;
//    list<int>my_list2;
    int x = 2000 * 2000;

    for (auto i = 0; i < x; ++i)
    {
        auto random = rand() % 10;
//        my_list1.push_back(random);  // avoid pushing to front and back to avoid cache misses.
        my_list1.push_front(random);
    }

    list<int>my_list2(my_list1);

    auto started1 = std::chrono::high_resolution_clock::now();
    my_list1.remove(5);
    auto done1 = std::chrono::high_resolution_clock::now();
    cout << "Execution time while using member function remove: " << chrono::duration_cast<chrono::milliseconds>(done1 - started1).count();

    cout << endl << endl;

    auto started2 = std::chrono::high_resolution_clock::now();
    my_list2.erase(remove(my_list2.begin(), my_list2.end(),5), my_list2.end());
    auto done2 = std::chrono::high_resolution_clock::now();
    cout << "Execution time while using generic algorithm erase: " << chrono::duration_cast<chrono::milliseconds>(done2 - started2).count();

    cout << endl << endl;
}

By increasing the number of elements and reversing the order of the calls to make erase happen first and remove happen second, the removes take longer. Again, this is more about caching than about the algorithm or the amount of work being done. If you are running another program that dirties the cache, checking the internet, or moving your mouse, your 32 KB L1 cache will be dirtied, and performance drops for that run.

I simply commented out `my_list1.push_back(random);` . This prevented the code from allocating memory from different locations for different parts of the list and increased caching efficiency. — Gardener, Oct 31 '18 at 13:37
As I increase x, the efficiency of the erase increases and surpasses the remove, but the caching still has a large efffect. — Gardener, Oct 31 '18 at 13:40
By multiplying x by 8 and then switching the order of the erases to be first and the removes to be second, I get an execution time for the erases of 617 and the removes of 689. Remember, the Removes were "faster" in the original test. However, no attempt was made to run the removes second. By running the removes second and by remove the front and back inserts, the removes are slower. — Gardener, Oct 31 '18 at 13:43
If you don't believe that this could be a caching issue, then you need to watch this: [Scott Myers on Caching](https://www.youtube.com/watch?v=WDIkqP4JbkE&t=1898s) — Gardener, Oct 31 '18 at 13:45

Generic Function remove() vs Member Function remove() for linked list

1 Answers1