2

I've read somewhere that std::map is, with current compilers, still the most efficient associative container we have in the STL, even with std::unsorted_map that --from what I read somewhere, I'm not sure where-- becomes more efficient on find() only if there is a lot of entries, like more than 40k.

So now I'm not really sure anymore because I always assumed that a hash map is more efficient at least in case of string keys.

So to be short:

If I have to choose an associative container with unknown entry count and with std::string as keys, what would be (at least in theory) the more efficient (on speed) choice for finding?

Klaim
  • 67,274
  • 36
  • 133
  • 188
  • 2
    What do you mean by efficiency - space, speed on insert , speed on finding? – mmmmmm Jan 09 '12 at 14:30
  • 2
    The STL does not have unordered_map, you are probably talking about the C++11 standard library, and for that there are many different implementations, so what is faster depends on your workload and the implementation (and probably also compiler settings) you use. You should profile. – PlasmaHH Jan 09 '12 at 14:30
  • 1
    Any answer depends on the usage pattern and implementation. They supply both primarily because neither is dependably "better". – Jerry Coffin Jan 09 '12 at 14:32
  • It's weird that you've read that `map` is still the most efficient - everyone I've read says they get large speed increases from switching to using `unordered_map` from plain `map` – Seth Carnegie Jan 09 '12 at 14:32
  • 2
    @PlasmaHH: If used correctly, STL often refers to the standard library subpart that contains functors, algorithms, iterators and containers. – Xeo Jan 09 '12 at 14:33
  • 1. I added "efficient (on speed)" to be more clear. 2. I will profile once I have the code written, but that's not the point : the point is if I have unknown size of container, big or not, what's the best bet for spped of lookup? – Klaim Jan 09 '12 at 14:45
  • @Klaim: there isn't a best bet unless you add further requirements. For starters, you haven't even said whether you want best average for random input, best average for a certain kind of user (in which case, what user), best worst case, etc. If you want to know which container you should provide as a library, Python seems to get along fine with the hash-based `dict` as its built-in map type. – Steve Jessop Jan 09 '12 at 14:46
  • 1
    @Xeo: Correctly according to whom? I can nowhere in the standard find the term STL defined. – PlasmaHH Jan 09 '12 at 14:56
  • @SteveJessop Would it be more clear if I say "best speed performance on element search, not considering insertion and remove"? – Klaim Jan 09 '12 at 14:56
  • @Klaim: no, that doesn't distinguish between the different meanings of "best" I described so it's no clearer. The difference in general between best expected case and best worst case is often a completely different algorithm. On the plus side, if you don't know which one you need then it probably doesn't matter which container you use of the two you've asked about. – Steve Jessop Jan 09 '12 at 14:57
  • @SteveJessop Ok, so I'm not sure how to say that, it's about predictability and short time for lookup... – Klaim Jan 09 '12 at 15:02

2 Answers2

10

Profile, profile, profile...

The problem with strings as keys is that comparing them is very slow (think difference in the last character of a 1000-character string). The advantage of an unordered_map with a string key comes at least in part from the fact that only the fixed-width hash values have to be compared, so in practice the unordered map may well be a lot faster.

The hash implementation may choose, for example, to use only a fixed number of spread-out digits to compute the hash value and thus end up putting some near-identical strings in the same bucket, so it's a trade-off. You can probably concoct a set of key values for which both containers would perform very poorly, but for a "random" or "typical" collection of strings, my bet is on the hash container.

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
1

When you have 40k entries or more, strings (or lists of elements, etc.) should not be used as associative keys in the standard containers. Instead, there comes a point much earlier where a trie or a ternary tree become better options. Both of those can build associative structures that only compare each character of your string (or element of your list, etc.) once. Ordered maps compare at every node (and so are O(m log n) - m size of string, n number of elements), and the unordered maps suffer from far more collisions at those sizes.

A ternary tree (each child branches to characters less, equal, or greater on a single char compare) takes the least memory of the better implementations, but tries are by far the fastest. Both of these may be built from boost.graph or some other generic graph library.

ex0du5
  • 2,586
  • 1
  • 14
  • 14