14

I have already tried searching for this but haven't found anything.

I am learning about STL containers, and understand the pros and cons of sequential and associative containers, however am not sure why anyone would prefer an unordered container over an associative one, as surely it would not affect element insertion, lookup and removal.

Is it purely a performance thing, i.e it would take more processing to insert / remove to an associative container as it has to go through sorting? I don't know too much about the system side of things but in my head I feel like an unordered container would require more 'upkeep' than one that is automatically organised.

If anyone could shed some light it would be really appreciated.

Stephane Rolland
  • 38,876
  • 35
  • 121
  • 169
aspirant_sensei
  • 1,568
  • 1
  • 16
  • 36
  • 5
    Obviously, the one that is automatically organised takes more upkeep: it needs to be reorganised any time it changes. The other one doesn't. – R. Martinho Fernandes Jan 25 '13 at 20:20
  • 1
    Read books or try to code something by yourself. With experience understanding will come. – Sergey P. aka azure Jan 25 '13 at 20:24
  • 5
    You're confusing me with your terminology. You contrast "unordered container" from "associative container", but those two things are not mutually exclusive. `std::map` and `std::unordered_map` are *both* associative arrays. Whereas `std::set` and `std::unordered_set` are not. – Benjamin Lindley Jan 25 '13 at 20:28
  • 1
    "Associative container" means "lookup by value", and is in contrast to "sequence container", which is "lookup by insertion position". – Kerrek SB Jan 25 '13 at 20:29
  • 1
    Well, I was mistaken. I guess "Associative Container" means something different in C++. It includes `std::set` and variants. But it also includes the unordered variants, so your terminology is still confusing. – Benjamin Lindley Jan 25 '13 at 20:36
  • Associative Containers + Unordered Associaitve Containers are different types of containers according to Josuttis' STD Library book. But thanks for clarification on lookup by value and lookup by insertion position! – aspirant_sensei Jan 25 '13 at 21:03

3 Answers3

10

Purely abstractly, consider the fact that an ordering of the elements is an extra "feature" that you have to pay for, so if you don't need it (like in lookup-only dictionary), then you shouldn't have to pay for it.

Technically this means that an unordered container can be implemented with expected lookup and in­ser­tion complexity O(1), rather than the O(log n) of ordered containers, by using hash tables.

On a tangentially related note, though, there is a massive practical advantage when using strings as keys: An ordered container has to perform full string comparison everywhere along the tree walk, while a hash container only performs a single hashing operation (which can even be "optimized" to only sam­ple a fixed number of characters from very long strings), and often turns out to be a lot faster in practice.

If ordering is not a requirement, then the best thing to do is to try out both container types (whose inter­face is almost identical) and compare the performance in your usage profile.

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • 5
    Your statement about the string comparison seems unintuitive to me. I'd expect `std::less` to be an *eager* comparison that returns as soon as it finds the first non-matching character. On the contrary, `std::hash` would probably compute the hash over the entire string. So, in cases where you have long strings as keys, but the strings all differ in the first few characters (I know that's a contrived example) a `map` might fare better than an `unordered_map`. Also, +1 for the suggestion to actually try it out. – Praetorian Jan 25 '13 at 20:33
  • 1
    In the common case, the cost of linked containers _dramatically_ exceeds the cost of the comparisons/hash computations themselves. See [Bjarne's talk on C++11 style](http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Keynote-Bjarne-Stroustrup-Cpp11-Style?format=html5) around minute 45 for a shocking discussion about the magnitude of this effect. – Andres Jaan Tack Jan 25 '13 at 20:56
  • 3
    @Praetorian the string comparison would be done probably more times than the hash though. – Seth Carnegie Jan 25 '13 at 21:01
  • Sorry for my ignorance, but surely O(1) means that the time/space required is independent of the data set size, so does this mean that there is no iteration through the 'order' (albeit unordered) to find a 'match', because otherwise surely it would take longer as more data is inputted. – aspirant_sensei Jan 25 '13 at 21:08
  • 3
    @Praetorian Think again. Hash is computed once, and cost of this is basically equal to comparing same string with equal string once. And string has to be compared in its entirety at least once, at the final matching element, if key is in the map at all. So, for nearly all practical applications, hash *is* faster. – hyde Jan 25 '13 at 21:10
  • thank you so much, this has helped me greatly! EDIT: one more question, again sorry for my ignorance but is a map essentially the same as a binary tree? For example is element lookup and insertion basically the same or is it different? i.e does a map still have logarithmic complexity due to binary search? – aspirant_sensei Jan 25 '13 at 21:18
  • @aspiring_programmer: Well, as I said, you have to *profile* the differences for your usecase. You can imagine that a map with lots of really long strings that only differ in a suffix may suffer more than one with extremely short, completely different strings. By comparison, the hash table performs *one* hash computation and then jumps straight into the bucket. – Kerrek SB Jan 25 '13 at 21:21
  • 3
    @aspiring_programmer No, hash really is O(1) no matter the hash size. Each item in hash at any given moment of time has been rehashed once on average, so insertion cost per item on average stays at O(1), it does not grow, millionth item has 99.9999% chanse of having insertion cost 1, and 0.0001% chance of requiring rehashing with cost of million, average is 2, ie. constant time. Hash is basically used as table index, and table access is of course O(1) as well. – hyde Jan 25 '13 at 21:21
3

We use the Unordered Container When the ordering of the Objects is not necessary and you care most about performance of objects lookup because the Unordered Container have a fastest search /insert at any place is O(1) rather than the Ordered Container(Associative Container take O(log n)) and Sequence containers take O(n)).

Muhammad Attia
  • 490
  • 3
  • 13
2

not sure why anyone would prefer an unordered container over an associative one

Those features are not exclusive. A container may be associative, and if it is, separately may also be unordered.

If you are familiar with hash maps, that is the technology being leveraged by unordered containers. The standard library uses the term "unordered" instead of "hash" so as not to impose a specific technology when what is desired is just specific performance promises. (see comment)

Drew Dormann
  • 59,987
  • 13
  • 123
  • 180
  • 3
    The reason `unordered` was chosen instead of `hash` is because many implementations already had non-standard versions that used the word `hash`. – Jesse Good Jan 25 '13 at 20:28
  • 2
    @JesseGood, It really is a shame, seeing as how the `std::` part goes to waste. – chris Jan 25 '13 at 20:31