I've run into an unexpected situation when trying to hash pointers using the default implementation of robin_hood::unordered_flat_set
from https://github.com/martinus/robin-hood-hashing.
My test case looks like the following:
void test()
{
std::vector<int*> v{ /* ~4k entries extracted from a real run */ };
robin_hood::unordered_flat_set<int*> fs;
std::ranges::for_each(v, [&](int* p) { fs.insert(p); }); // boom!
}
I assume the default hash function is "good" (which the comments indicate is taken from murmurhash3). The diagnostic output indicates that the robin_hood implementation throws an overflow error after unsuccessfully calling try_increase_info
5 times.
I did a quick analysis of the sorted data. All of the data is between 0x7fc768000000
and 0x7fc788000000
. The most common difference between adjacent entries is n*128 bytes (0x80), where n is a small #. There are larger gaps in the data as well. Of course, I can easily fix the issue by using std::unordered_set<int*>
. The maximum bucket size for the data set is 6, which is pretty reasonable.
The whole point of using a non-standard hash is for performance, but I can't use it if there are correctness issues. I can accept switching my code to (yet) another hash implementation provided that there is a stronger guarantee that a relatively straightforward data sequence of arbitrary pointer sequences won't cause an internal error.
Hashing of pointer values contains some useful information. However, the accepted answer basically states "here are some hash functions that might be of use", which hardly gives me warm fuzzy feelings.
Any advice?