0

While looking for some deterministic (multiple-runs, multiple-machines) hasher for integers, I stumbled upon boost::hash_combine(size_t & seed, T const& v). Unfortunately, in the documentation it is stated that

This hash function is not intended for general use, and isn't guaranteed to be equal during separate runs of a program - so please don't use it for any persistent storage or communication.

However, when looking through the implementation I did not see any suspicious code that may cause divergent behaviour in separate runs - just some multiplying and adding (with possible overflows), bit-shifting and xor operations, everything using constants. What is more, hasher behaved consistently when executed several times.

So where is the actual problem that forbids to guarantee determinism across the runs?

Pasting below the most interesting pieces:

template <class T>
inline void hash_combine(std::size_t& seed, T const& v)
{
    boost::hash<T> hasher;
    return boost::hash_detail::hash_combine_impl(seed, hasher(v));
}

template <typename T>
typename boost::hash_detail::basic_numbers<T>::type hash_value(T v)
{
    return static_cast<std::size_t>(v);
}

inline void hash_combine_impl(boost::uint64_t& h,
        boost::uint64_t k)
{
    const boost::uint64_t m = UINT64_C(0xc6a4a7935bd1e995);
    const int r = 47;

    k *= m;
    k ^= k >> r;
    k *= m;

    h ^= k;
    h *= m;

    // Completely arbitrary number, to prevent 0's
    // from hashing to 0.
    h += 0xe6546b64;
}
knopers8
  • 438
  • 5
  • 11
  • 1
    maybe there is no problem. Just because it is not intended to be used like that, it doesnt necesarrily imply that you cannot use it that way, but it does mean that if you rely on certain behaviour that isnt guaranteed, you may be out of luck when you go to the next version of the library, just saying... – 463035818_is_not_an_ai Aug 20 '18 at 11:23

1 Answers1

1

The reason is that hashes are frequently used in hash tables. A malicious user trying to attack a service (that uses C++ code involving hash tables) may massively degrade its performance by forcing hash collisions for items inserted into a hash table (with performance for common operations going from O(1) to O(N)). By having every run use a different hash function, this becomes a lot harder to do.

std::hash is standardized like that too. To cite https://en.cppreference.com/w/cpp/utility/hash:

Hash functions are only required to produce the same result for the same input within a single execution of a program; this allows salted hashes that prevent collision DoS attacks. (since C++ 14)

Max Langhof
  • 23,383
  • 5
  • 39
  • 72
  • That definitely makes sense, but in this concrete case I cannot see any mechanism which changes the behaviour among several runs. Should I interpret this 'disclaimer' as a safety measure for the future, when such a mechanism will be implemented? – knopers8 Aug 20 '18 at 12:18