performance problems cause by ptmalloc?

Question

I'm testing for std::forward_list sort performance, generate 1000000 random numbers and insert to std::forward_list, sort for 5 times

#include <forward_list>
#include <chrono>
#include <random>
#include <iostream>
#include <vector>

int main()
{
    std::random_device rd;
    std::mt19937 gen(rd());
    std::uniform_int_distribution<int> dis(-100000000, 100000000);
    std::vector<int> v;
    for (int i = 0; i < 1000000; ++i) {
        v.push_back(dis(gen));
    }
    std::forward_list<int> *list;
    for (int j = 0; j < 5; ++j) {
        auto list = new std::forward_list<int>();
        for (auto it = v.begin(); it != v.end(); ++it) {
            list->insert_after(list->cbefore_begin(), *it);
        }
        auto start = std::chrono::steady_clock::now();
        list->sort();
        auto end = std::chrono::steady_clock::now();
        std::chrono::duration<double> diff = end - start;
        std::cout << diff.count() << " s\n";
        delete list;
    }
}

Compile this code with g++ test2.cpp -o test2 -O2 and run, get output

If use tcmalloc, compile this code with g++ test2.cpp /usr/local/lib/libtcmalloc_minimal.a -o test2 -O2 and run, get output

If delete code delete list;, compile with g++ test2.cpp -o test2 -O2 and run, get output

So I think the tcmalloc may cause some performance problems?

`g++ test2.cpp -o test2` -- Timing unoptimized builds is meaningless. Turn on optimizations, rebuild, and time an optimized build. — PaulMcKenzie, Mar 15 '23 at 03:18
Performance comparisons with optimizations disabled are all but meaningless. What happens when you enable optimizations (e.g. `-O2`?) — Brian61354270, Mar 15 '23 at 03:18
Something smells off about how you're using `list` here. Why declare `list` in `main` only to shadow it with a different variable in the for loop? And why dynamically allocate `std::forward_list`s with new/delete instead of just creating instances with automatic storage in the for loop's block? — Brian61354270, Mar 15 '23 at 03:21
What is your claim, that deleting the list increases the timing? does it do it for normal malloc as well? I also have experience that pool allocators can defer some cost to destruction (deallocation really), but that is somehow expected since there is non-trivial bookkeeping to do. — alfC, Mar 15 '23 at 03:21
You may need your program to have a side-effect to ensure meaningful results. Otherwise, the entire thing could potentially be optimized away... _e.g_: `std::cout << std::is_sorted(list.begin(), list.end());`. Another consideration is whether one of the allocators produces more cache-efficient memory layout than the other. — paddy, Mar 15 '23 at 03:26
Consider trying `v.reserve(1000000);` before your push_back loop. Otherwise you're essentially causing memory fragmentation before the test even begins. — paddy, Mar 15 '23 at 03:32
@Brian61354270 I create instances with automatic storage in the for loop's block at first, but found the performance problems, so I use new/delete to test. — szh, Mar 15 '23 at 03:34
To test whether the allocator is causing problems, just time code that builds and destroys the list. Since you're only timing the _sort_, then I expect the reason is poor cache locality. You can test _that_ by creating a static array of `int*`, which you point at each value in your list. Then create a comparator to dereference and compare the data those pointers refer to, and sort the array with it. This removes several other considerations. One more to consider is to always use the same random seed, since sorting performance can be heavily affected by data. — paddy, Mar 15 '23 at 03:44
My best money's on the random seed. You need to run a single test multiple times. If the results vary wildly, then it's more likely to be the data itself rather than the allocator. — paddy, Mar 15 '23 at 03:55

performance problems cause by ptmalloc?

0 Answers0