Why is move assignment of unordered_map slow?

Question

I am trying to understand how the move/rvalue assignment operator works. I know that it is largely implementation-specific, but assuming that move assignment in unordered_map works by only swapping the underlying data pointer or size attributes, I suppose it should be extremely fast?

This is the code that I tried to run:

#include <chrono>
#include <functional>
#include <iostream>
#include <memory>
#include <string>
#include <unordered_map>
using namespace std;
 
void time_it(function<void()> f)
{
    auto start = chrono::steady_clock::now();
    f();
    auto end = chrono::steady_clock::now();
    auto diff = end - start;
    cout << chrono::duration<double, milli>(diff).count() << " ms" << endl;
}
 
using umap = unordered_map<string, string>;
static const size_t MAP_SIZE = 1000000;
 
int main()
{
    umap m;
    for (int i = 0; i < MAP_SIZE; i++)
    {
        auto s = to_string(i);
        m[s] = s;
    }
 
    time_it([&]() {
        cout << "copy\n";
        auto c = m;
    });
    time_it([&]() {
        cout << "move\n";
        auto c = move(m);
    });
}

It returns:

copy
204.4 ms
move
98.568 ms

How come that the move assignment operator takes so long (~100 ms)?

I compiled using g++ test.cpp -O3. This is what my g++ -v returns:

Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=c:/mingw/bin/../libexec/gcc/mingw32/6.3.0/lto-wrapper.exe
Target: mingw32
Configured with: ../src/gcc-6.3.0/configure --build=x86_64-pc-linux-gnu --host=mingw32 --with-gmp=/mingw --with-mpfr=/mingw --with-mpc=/mingw --with-isl=/mingw --prefix=/mingw --disable-win32-registry --target=mingw32 --with-arch=i586 --enable-languages=c,c++,objc,obj-c++,fortran,ada --with-pkgversion='MinGW.org GCC-6.3.0-1' --enable-static --enable-shared --enable-threads --with-dwarf2 --disable-sjlj-exceptions --enable-version-specific-runtime-libs --with-libiconv-prefix=/mingw --with-libintl-prefix=/mingw --enable-libstdcxx-debug --with-tune=generic --enable-libgomp --disable-libvtv --enable-nls
Thread model: win32
gcc version 6.3.0 (MinGW.org GCC-6.3.0-1)

What reason do we have to believe that ideaone is a good place to do benchmarking? If you're going to benchmark something, it really needs to be in a controlled and well-understood environment. — Nicol Bolas, Aug 17 '22 at 15:53
Any question that asks about performance requires that you specify the compiler, compiler version, and optimization/build settings used to build the application. — PaulMcKenzie, Aug 17 '22 at 15:55
My computer gives wildly different results. Are you asking why IDEOne produces strange timing results? — Drew Dormann, Aug 17 '22 at 15:56
How do I check what the **optimization setting** is at for Ideone's gcc 8.3 C++14? — Eljay, Aug 17 '22 at 16:00
You measured unique_ptr differently than the rest. You include cout in the timed code. — Sebastian, Aug 17 '22 at 16:01
You aren't actually measuring anything meaningful with that code. Your `shared_ptr` copies will be optimized away. What you're actually measuring is the time it takes to call an empty function via pointer and stop the timer. — Miles Budnek, Aug 17 '22 at 16:05
Okay guys, I removed my `shared_ptr` vs `unique_ptr` thingie. I think I will ask that in another question if possible. For this question I will focus on the `unordered_map`. — Shadow Lurker, Aug 17 '22 at 16:07
You should remove output from time measuring. Otherwise you're mainly measuring the output and the actual work is negligible. — jabaa, Aug 17 '22 at 16:08
Still, you're not just measuring copy/move initialization, but also destruction (in the move case, that's actually _all_ you're measuring, since the compiler can see that the moved-to object is never used; it just calls `clear` on the source `unordered_map`). — Miles Budnek, Aug 17 '22 at 16:15
@MilesBudnek actually... you are correct! I did not realize that I destructed the `c` inside `time_it`. Changed it to `c = move(m); m = c;` and it became 0 ms. — Shadow Lurker, Aug 17 '22 at 16:23

Shadow Lurker · Accepted Answer · 2022-08-17T16:33:56.123

0

As MilesBudnek explained in his comment, I only counted the runtime for unordered_map destructor (i.e. the object c) inside my second time_it inner function.

I changed it to:

    time_it([&]() mutable {
        cout << "copy\n";
        auto c = m;
        m = c;
    });
    time_it([&]() mutable {
        cout << "move\n";
        auto c = move(m);
        m = move(c);
    });

to make the underlying object of c not getting deallocated, and now it says ~0.6 ms using -O0 to not let the compiler do undesirable stuff.

Thanks everyone, really sorry for my mistakes in the post!

edited Aug 17 '22 at 16:33

answered Aug 17 '22 at 16:26

Shadow Lurker

3
3

Please remove the cout from the lambda. That is not, what you want to measure. – Sebastian Aug 17 '22 at 20:52
1

Also measuring `-O0` is meaning less. It doesn't let the compiler do desirable stuff, stuff that's basically required for all the zero cost abstractions in C++. I'm not sure what msvc does but both gcc and clang code generation is for a 60th area stack machine before the optimizer is let loose on it. It's the most horrible code. – Goswin von Brederlow Aug 18 '22 at 15:22

Why is move assignment of unordered_map slow?

1 Answers1