Is this slower because of two lookups instead of one?

Question

When I want to make sure that the entry I want to use exists, I usually do this.

#include <unordered_map>

struct type { int member; };
std::unordered_map<type> map;

if (map.find(key) != map.end())
    map[key].member = 42;

However, I think it performs two lookups for key in the hash map. This caches the lookup.

#include <unordered_map>

struct type { int member; };
std::unordered_map<type> map;

auto find = map.find(key);
if (find != map.end())
    find->second.member = 42;

The first option feels way more expressive. Is it really slower?

Paul Evans · Answer 1 · 2014-09-20T18:37:19.787

22

It may be slower, it may not (you're now doing an extra write in your "speed up") but one really shouldn't worry about such minor optimisations when writing code. Write clear expressive code. Then if your program really is too slow, run profiling tools on it and find your bottleneck(s). If this code is in fact a real problem, then and only then try your "speed up" and see if it matters.

edited Sep 20 '14 at 18:37

answered Sep 20 '14 at 15:21

Paul Evans

27,315
3
37
54

2

While I agree with most of the answer (+1) - we actually don't know if there is extra write to memory. Compiler is free to just use the registers or write to memory (which is probably what is happening in optimized build). – Maciej Piechotka Sep 20 '14 at 18:05
@MaciejPiechotka quite right, edited answer accordingly - thanks – Paul Evans Sep 20 '14 at 20:07
1

Put simply, "programs often spend amazing amounts of time in the *darndest* places!". :-) – Bob Jarvis - Слава Україні Sep 21 '14 at 01:48

Emilio Garavaglia · Accepted Answer · 2014-10-20T09:41:03.080

15

Yes, because you search the key twice: map[key] search for the key exactly like map.find, of which you threw away the result.

It's like open a drawer to see if there is a given object, say "ah yes!" and close the drawer, then open it up again and research the object to change it.

The second code opens the drawer, search for an object and change it.

There can be compiler optimizations that allow to avoid the double search, or that may reduce the search in constant time, and there can be compiler optimization that allow to avoid the auto find variable to be stored on memory (it can be a CPU register, since it usage is very local).

The entire problem, will in effect reduce in comparing the time for twice hash computation twice (and walk the eventual map slot, in case of hash collision) and the time to access the extra variable:

2*H < H+M

That means H < M. If M is a register and H is not trivial, it's hard for H to be less than M.

edited Oct 20 '14 at 09:41

answered Sep 20 '14 at 15:21

Emilio Garavaglia

20,229
2
46
63

1

You don't know this will be faster 'till it's measured with profiling. What about the extra write to memory that's traded for a `find` and hash look-ups are fast. – Paul Evans Sep 20 '14 at 15:27
1

The local auto variable (the map iterator) is most likely optimized in a CPU register. The walking inside the map tree is made certainly twice (and if the map has 2^10 node, requires 10 read into sparse memory locations) And since the locations are sparse, caching and hash lookup don't necessarily help. With vectors (instead of map) you'll be almost sure rigth. – Emilio Garavaglia Sep 20 '14 at 15:33
3

Is there a chance that the compiler optimizes this away? – danijar Sep 20 '14 at 16:06
I like to try to code consistently for all containers as far as possible. With **some** containers it could be much slower to throw away the results of `find()`. I would always retain the value (like the second example) if I intend to use it. Otherwise I would probably use `count()`. – Galik Sep 20 '14 at 16:17
4

@EmilioGaravaglia what "walking inside the map tree"? It's an *unsorted_map*, it does an O(1) hash look-up that could also be optimised in a CPU register. It *will* be cached at the very least. And if it is in fact faster an optimising compiler will do this for you anyway. – Paul Evans Sep 20 '14 at 17:35
3

@EmilioGaravaglia a) the question asks about `unordered_map` so probably no tree walking is involved b) 10 repeated reads should be nothing as you have a perfect _temporal_ locality, unless the elements are very large. 2^10 nodes is 1 ki of nodes while L1$ can be as large as 32 kiB on mobile chips - sufficient to keep 2^10 nodes trees (assuming nodes are small) entirely in L1$ of a tablet. 10 nodes would tak a fraction of it- bigger problem with walking the tree twice I would expect in branch mispredictions. The second version will probably be faster but I'm not sure if it would be measurable. – Maciej Piechotka Sep 20 '14 at 17:35
@PaulEvans: it is O(1) if the hash is unique for all the keys and the hash table is wide enough, otherwise is O(1+N/W) and can even be worst than logN. In any case that's just a sample. The point is demostrate that A+B > 2A where A is the find algorith and B is a register operation. – Emilio Garavaglia Sep 20 '14 at 17:58
@MaciejPiechotka: 2^10 is too few? Try with 2^20 and then 2^30 and then 2^40. And the longest pissing context will in any case sooner or late be lost. That's valid for all pissing context. That said, your answer is much realistic then your comment here: +1 – Emilio Garavaglia Sep 20 '14 at 18:01
1

@EmilioGaravaglia Still assuming trees and worse case in RB-tree it's 81 notes in 2^40 tree. To really fill a 32 kiB hash by nodes by single descend and get the misses (assuming something like LRU and infinite associativity) you would need ~2^512 nodes (assuming 4 words per node and 8 bytes per word) which would occupy larger amount of space then observable universe (2^305). Ok - it's true that infinite associativity is unreasonable (though 4-8 way are close to it according to simulations) - and word size is too small but it's very unlikely situation to run out of cache for _single_ lookup. – Maciej Piechotka Sep 20 '14 at 18:32
My point was that you're unlikely to run out of cache just by descending as $ is huge compared to utilization in such case (as opposed to "And since the locations are sparse, caching and hash lookup don't necessarily help." - you want to exploit the temporal locality not spatial here). 2^10 nodes fitting into L1$ of table was meant to illustrate the point (sure - at some point the trees will be larger then L1$ but I'm not so sure about the nodes of single path). – Maciej Piechotka Sep 20 '14 at 18:35

Maciej Piechotka · Answer 3 · 2014-09-20T18:00:28.457

Yes, it may be slower but probably not measurably slower. There is some extra work involved:

The hash will likely to be computed twice, unless you have sufficiently smart compiler , use vendor extentions such as pure or const or use similar method. Note that if hash is trivial and compiler knows it's code most of compilers probably are sufficiently smart nowadays.
The position of bucket needs to be found second time (unless compiler will notice that this is the same hash so it doesn't need to recompute)
The traversal of collision lists (or similar method depending on the collision resolution) needs to be performed. Again - sufficiently smart compiler might notice we are doing it twice, we don't actually modify anything etc. we might have such compilers nowadays but I'm not 100% sure if we are there. Even if they aren't they are cached reads and they probably will not impose any significant cost (compared to for example taking a hash or missed read). Not going into details of processor architecture L1$ read hit takes ~4 cycles latency on i7 (data from memory, might be wrong) and processor may do other work while waiting on it.

So to sum up if:

Your hash function is expensive (for example it needs to take a hash of a string).
The compiler is insufficiently smart to deduce that hash function does not modify the object and indeed return the same value.
It's code in inner loop.

then you might see a difference.

As a final word - it probably won't matter and it's not big architectural difference but 5s optimization. So write whatever you find easier to maintain and revisit the question when the profiler will show that this functions causes a slowdown.

score 2 · Answer 4 · answered Sep 20 '14 at 16:25

2

Unless you have a specific reason to preserve the value in an existing entry (if it already exists) you can skip the first search completely, and just set the new value:

#include <unordered_map>

struct type { int member; };
std::unordered_map<key_t, type> map;

map[key].member = 42;

This will modify the existing entry (if there is one) and insert a new entry if none exists.

answered Sep 20 '14 at 16:25

Jerry Coffin

476,176
80
629
1,111

I suspect the OP does NOT WANT to CREATE an entry if it is NOT already there. Otherwise the "if not found then create" logic in `map[x]` is the best expressive choice. – Emilio Garavaglia Sep 20 '14 at 16:31
@EmilioGaravaglia: ...and yet what he says is: "When I want to make sure that the entry I want to use exists [...]" – Jerry Coffin Sep 20 '14 at 18:32

score 2 · Answer 5 · answered Sep 20 '14 at 16:38

2

Yes, it could be slower. If you are looking for something more expressive maybe you should encapsulate the std:unordered_map(which might be a good idea anyway) and expose a pointer. Then you could write something like:

auto thing = getThing(key);
if (thing) 
  thing->member = 42;

answered Sep 20 '14 at 16:38

Chris Drew

14,926
3
34
54

I don't understand why the standard container doesn't provide such a method. – danijar Oct 28 '14 at 09:34

Is this slower because of two lookups instead of one?

5 Answers5