2

I want to know which data-structures are more efficient for iterating through their elements between std::set, std::map and std::unordered_set, std::unordered_map.

I searched through SO and I found this question. The answers either propose to copy the elements in a std::vector or to use Boost.Container, which IMHO don't answer my question.

My purpose is to keep in a container a big number of unique elements, that most of the time I want to iterate through them. Insertions and extractions are more rare. I want to avoid std::vector in combination with std::unique.

Community
  • 1
  • 1
101010
  • 41,839
  • 11
  • 94
  • 168
  • 2
    If iteration is frequent, you really, really, really want a vector. – T.C. Jul 21 '15 at 08:40
  • @T.C. I know but lets pretend that I can't use a `std::vector`. Between the ordered and unordered which is the best choice and why? :) – 101010 Jul 21 '15 at 08:42
  • @inf would be nice but I can't use `boost`. – 101010 Jul 21 '15 at 08:42
  • 2
    @101010: In general, these questions cannot be answered by thinking and can only be settled by empirical testing -- and the answer can change depending on the computer itself, what other things are running on the computer, the library implementation, how the container is being used, and so forth. –  Jul 21 '15 at 08:46
  • 1
    @101010: I cannot count how often I have read that. "I cannot use Boost." Unless Boost is not actually supported on your target platform, I cannot really understand who comes up with such limitations. Boost is the next best thing to the standard library, and excluding it from a project on principles is.... well... before I overstep boundaries here. ;-) – DevSolar Jul 21 '15 at 08:51
  • @DevSolar In my company we don't use boost. Reason: Unknown... – 101010 Jul 21 '15 at 08:53
  • 1
    @101010: Let's just say that I hope you have enough standing to bring that subject up for discussion. At least get them to tell you the *reason*, because not using Boost for C++ is a bit like not using design patterns in Java, because... unknown. ;-) – DevSolar Jul 21 '15 at 09:05
  • 2
    @Hurkyl _'cannot be answered by thinking'_ well it would help thinking for sure – Nikos Athanasiou Jul 21 '15 at 09:06
  • @Nikos: Ah, I meant "cannot be answered by thinking alone". –  Jul 21 '15 at 09:08

3 Answers3

3

The difference does not lie between the ordering or lack of one but in the backing container. If it's a contiguous memory it should be fast to iterate over, due to simple implementation of iterator and cache friendliness.

Unordered containers are usually stored as a vector of vectors (or a similar thing), while ordered containers are implemented using trees, but it is left for implementation after all. This would suggest that iterating over unordered version should be waster. However this is left for implementation after all, and I saw implementations (which bent rules a little to be fair) with different behaviour.

Generally speaking, container performance is quite a complex topic and usually has to be tested in actual application to get reliable answer. There is plenty on implemention-defined stuff that might affect the performance. I'd go with hash_set if I had to go in blind. Copying into a vector might also turn out a good option.

EDIT: As @TonyD said in it's comment, there is a rule, that disallows invalidating iterators during addition of element when the max_load_factor() is not exceeded, this practically rules out backing containers which are contiguous in memory.

Thus, copying everything into a vector seems like even more reasonable option. If you need to remove duplicates, a feasible option might be to use http://en.cppreference.com/w/cpp/algorithm/sort and have dupes easily ignored. I have heard that using vector and sort to have a sorted array (or vector) is quite often a used option in case of need for a container that needs to be sorter and is being iterated over more often than modified.

luk32
  • 15,812
  • 38
  • 62
  • *"Unordered containers are usually stored as a vector of vectors (or a similar thing)"* only if you consider a vector of linked lists to be similar (I don't): not hanging contiguous vectors of elements off buckets is practically guaranteed given the Standard's requirement that existing objects aren't moved during insertions that don't increase load factor beyond `max_load_factor()` thereby triggering a whole-table rehash. Not as much is left to implementation choice as most people think, though you mention "`hash_set`" which was a common name for pre C++11 implementations, and they varied.... – Tony Delroy Jul 21 '15 at 09:43
  • 1
    @TonyD I also don't consider them similar, as per my 1st paragraph, memory "contiguosity" is very important here. I know that there is less space to move that one can think of, I think I've had such a (great btw) discussion once (I even think it was with you), that there are subtle rules that basically exclude some implementations. Though, still enough to affect performance in certain cases. IMO it is very fragile and really needs to be measured. I will update answer never the less. Copying into vector might become best option then. – luk32 Jul 21 '15 at 13:22
3

Lets consider set vs unordered_set.

The main difference here is the 'nature' of the iteration, that is the traversal of the set will give you the elements in order while traversing a range in an unordered set will give you a bunch of values in no particular order.

Suppose you want to traverse a range [it1, it2]. If we exclude the lookup time that's needed to find elements it1 and it2 there can be no direct mapping from one case to another since the elements in between are not guarrandeed to be the same even if you've used the same elements to construct the container.

There are cases however where something like this has meaning when e.g. you want to traverse a fixed number of elements (regardless of what they are) or when you need to traverse the whole container. In such cases you need to consider implementation mechanics :

Sets are usually implemented like Red–black trees (a form of binary search trees). Like all binary search trees allow efficient in-order traversal (LRR: left root right) of their elements. That is to traverse you pay the cost of pointer chasing (just like traversing a list).

typical red black tree layout

Unordered sets on the other hand are hash tables and to my knowledge the STL implementation uses hashing with chaining. That means (in a very very high level) that what's used for the structure is a (contiguous) buffer where each element is the head of a chain (list) that contains the elements. The way the elements are layed out across those chains (buckets) and across the buffer will affect the traversal time, however you'll be chasing pointers once again jumping through differents lists as well this time. I don't think it'll vary significantly from the tree case but won't be any better for sure.

schematic layout of hashing with chaining

In any case micro tuning and benchmarking will give you the answer for your particular application.

Community
  • 1
  • 1
Nikos Athanasiou
  • 29,616
  • 15
  • 87
  • 153
  • Adding [link](http://coliru.stacked-crooked.com/a/732aa1187a99f862) to the benchmark you wrote earlier.... Cheers. – Tony Delroy Jul 23 '15 at 22:38
0

iterate from fastest to slowest should be : set > map > unordered_set > unordered_map; set is a little lighter than map, and they are ordered with binary tree rule, so should be faster than unordered_ containers.

hero
  • 1