How to hash a buffer with std::hash?

Question

I study std::hash's references, and find it can't hash a serialized data, like char*. Is it correct or normal ? How can I hash a serialized buffer?

You could use [`boost::hash_combine`](http://www.boost.org/doc/libs/1_57_0/doc/html/hash/combine.html) to iterate over the buffer and create a hash value. — Praetorian, Dec 30 '14 at 03:58
What kind of data are you trying to hash? Telling us it's a `char*` doesn't really tell us anything because anything can be represented that way. — David Schwartz, Dec 30 '14 at 04:35
@preat note that hash combine is simple to write your own: the key part is making sure the combine value is white noise. — Yakk - Adam Nevraumont, Dec 30 '14 at 05:05
@Praetorian: I'm developing library with Android NDK, and I can't use boost. @Bot,@David Schwartz: Things just as simple as you think. Just think the buffer is `char[]`. @Yakk,@Nemo: Your advises and links are great, thanks for your response. — naive231, Dec 30 '14 at 09:29

score 0 · Accepted Answer · answered Dec 30 '14 at 03:52

The idea with std::hash is to provide a general hashing algorithm for fixed-size data that is good enough for most uses, so users don't need to roll their own every time. The problem with variable length inputs is that hashing them is a much more complex problem, often depending on characteristics of the data itself, to require the standard library to include such an algorithm, and thus the implementation is punted to the developer. For example, a hash algorithm that works great for ASCII strings might not work so great for data containing mostly zeros, and a good algorithm for the latter might give too many collisions for strings. (There are also speed tradeoffs; some hashing algorithms might work great for everything but be too slow.)

IIRC, an old, old hashing algorithm for ASCII strings is to simply multiply every character's ASCII value together. Needless to say, this is really fast and only works because there are no zeros.

So instead of using std::hash, you're supposed to write your own hashing class with the same API (i.e. it must define size_t operator()(Key)) and pass that class as the Hash template parameter to hash-using templates like std::unordered_set.

(IIRC, an old, old hashing algorithm for ASCII strings is to simply multiply every character's ASCII value together) ... That's a terrible solution, as every character with an even value will shift another zero into the LSB, eventually resulting in a hash of 0. To make this work requires MSBs be shifted back into LSBs. — Peter Fletcher, May 06 '22 at 20:31
I didn’t say it was good, and you’re right. Maybe they shifted right a bit each time? (This was back when multiplies were in software half the time.) It’s as bad as literal checksums, where the bytes were just added together and two bit flips of the same bit in different bytes would cancel out. In any case it’s terrible and shouldn’t be used because there’re much better ways. — Mike DeSimone, May 07 '22 at 21:37

How to hash a buffer with std::hash?

1 Answers1