1

I'm testing for std::forward_list sort performance, generate 1000000 random numbers and insert to std::forward_list, sort for 5 times

#include <forward_list>
#include <chrono>
#include <random>
#include <iostream>
#include <vector>

int main()
{
    std::random_device rd;
    std::mt19937 gen(rd());
    std::uniform_int_distribution<int> dis(-100000000, 100000000);
    std::vector<int> v;
    for (int i = 0; i < 1000000; ++i) {
        v.push_back(dis(gen));
    }
    std::forward_list<int> *list;
    for (int j = 0; j < 5; ++j) {
        auto list = new std::forward_list<int>();
        for (auto it = v.begin(); it != v.end(); ++it) {
            list->insert_after(list->cbefore_begin(), *it);
        }
        auto start = std::chrono::steady_clock::now();
        list->sort();
        auto end = std::chrono::steady_clock::now();
        std::chrono::duration<double> diff = end - start;
        std::cout << diff.count() << " s\n";
        delete list;
    }
}

Compile this code with g++ test2.cpp -o test2 -O2 and run, get output

0.994629 s
3.01309 s
2.98853 s
2.99701 s
3.01637 s

If use tcmalloc, compile this code with g++ test2.cpp /usr/local/lib/libtcmalloc_minimal.a -o test2 -O2 and run, get output

0.551351 s
0.550282 s
0.590626 s
0.613431 s
0.559123 s

If delete code delete list;, compile with g++ test2.cpp -o test2 -O2 and run, get output

0.893076 s
0.952251 s
0.95971 s
0.931195 s
0.922877 s

So I think the tcmalloc may cause some performance problems?

szh
  • 41
  • 2
  • `g++ test2.cpp -o test2` -- Timing unoptimized builds is meaningless. Turn on optimizations, rebuild, and time an optimized build. – PaulMcKenzie Mar 15 '23 at 03:18
  • Performance comparisons with optimizations disabled are all but meaningless. What happens when you enable optimizations (e.g. `-O2`?) – Brian61354270 Mar 15 '23 at 03:18
  • 1
    Something smells off about how you're using `list` here. Why declare `list` in `main` only to shadow it with a different variable in the for loop? And why dynamically allocate `std::forward_list`s with new/delete instead of just creating instances with automatic storage in the for loop's block? – Brian61354270 Mar 15 '23 at 03:21
  • What is your claim, that deleting the list increases the timing? does it do it for normal malloc as well? I also have experience that pool allocators can defer some cost to destruction (deallocation really), but that is somehow expected since there is non-trivial bookkeeping to do. – alfC Mar 15 '23 at 03:21
  • 1
    You may need your program to have a side-effect to ensure meaningful results. Otherwise, the entire thing could potentially be optimized away... _e.g_: `std::cout << std::is_sorted(list.begin(), list.end());`. Another consideration is whether one of the allocators produces more cache-efficient memory layout than the other. – paddy Mar 15 '23 at 03:26
  • I have test -O2 and update – szh Mar 15 '23 at 03:29
  • Consider trying `v.reserve(1000000);` before your push_back loop. Otherwise you're essentially causing memory fragmentation before the test even begins. – paddy Mar 15 '23 at 03:32
  • @Brian61354270 I create instances with automatic storage in the for loop's block at first, but found the performance problems, so I use new/delete to test. – szh Mar 15 '23 at 03:34
  • To test whether the allocator is causing problems, just time code that builds and destroys the list. Since you're only timing the _sort_, then I expect the reason is poor cache locality. You can test _that_ by creating a static array of `int*`, which you point at each value in your list. Then create a comparator to dereference and compare the data those pointers refer to, and sort the array with it. This removes several other considerations. One more to consider is to always use the same random seed, since sorting performance can be heavily affected by data. – paddy Mar 15 '23 at 03:44
  • @paddy thinks, I would try test it later – szh Mar 15 '23 at 03:48
  • My best money's on the random seed. You need to run a single test multiple times. If the results vary wildly, then it's more likely to be the data itself rather than the allocator. – paddy Mar 15 '23 at 03:55

0 Answers0