I think you've misunderstood the complexity guarantees of std::unordered_map
's insert, removal and find operations. The worst case O(size())
mentioned only happens if you implement a terrible hash function for the Key
type that generates lots of collisions, yet distinct keys do not compare equal.
Say you have
struct terrible_hash
{
std::size_t operator()(int i) const
{ return 42; }
};
std::unordered_map<int, foo, terrible_hash> m;
All insertions of new keys into the map above will be O(m.size())
because the function will be forced to search linearly through each element since they all hash to the same value.
Given a decent hash function, those operations should be (amortized) constant time.
Going back to your question of string
vs a 128-bit number (UUID) as the key type; it depends on your implementation, but typically the latter should be quicker. I say this based on the following assumptions:
Typical hash<string>
specializations will iterate over the entire string and perform bitwise math on each byte and combine it with the existing result. For instance, partial/simplified implementation taken from VS2013:
size_t _Val = 14695981039346656037ULL;
for (size_t _Next = 0; _Next < _Count; ++_Next)
{
_Val ^= (size_t)_First[_Next];
_Val *= 1099511628211ULL;
}
return _Val;
With your 128-bit key type, you should be able to combine the two 64-bit words to generate a hash with fewer operations. For example you could define a helper function template, and use it to combine the hashes from the 64-bit words.
template <class T>
inline void hash_combine(std::size_t& seed, const T& v)
{
std::hash<T> hasher;
seed ^= hasher(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);
}
The magic numbers are stolen from boost::hash_combine
. Again, looking at the MSVC implementation for std::hash<uint64_t>
, they alias into the 64-bit integer via an unsigned char *
and call the algorithm I pasted above, but in this case the number of iterations is known and the compiler will be able to optimize better.
Having said all that, if performance is very important, you need to measure both choices for keys, and then make a decision.