I study std::hash
's references, and find it can't hash a serialized data, like char*
. Is it correct or normal ? How can I hash a serialized buffer?

- 24,485
- 12
- 80
- 90

- 1,360
- 1
- 11
- 28
-
1You could use [`boost::hash_combine`](http://www.boost.org/doc/libs/1_57_0/doc/html/hash/combine.html) to iterate over the buffer and create a hash value. – Praetorian Dec 30 '14 at 03:58
-
maybe you should add what you have already tried – Irrational Person Dec 30 '14 at 04:21
-
What kind of data are you trying to hash? Telling us it's a `char*` doesn't really tell us anything because anything can be represented that way. – David Schwartz Dec 30 '14 at 04:35
-
@preat note that hash combine is simple to write your own: the key part is making sure the combine value is white noise. – Yakk - Adam Nevraumont Dec 30 '14 at 05:05
-
1Related: http://isocpp.org/files/papers/n3980.html – Nemo Dec 30 '14 at 05:13
-
@Praetorian: I'm developing library with Android NDK, and I can't use boost. @Bot,@David Schwartz: Things just as simple as you think. Just think the buffer is `char[]`. @Yakk,@Nemo: Your advises and links are great, thanks for your response. – naive231 Dec 30 '14 at 09:29
1 Answers
The idea with std::hash
is to provide a general hashing algorithm for fixed-size data that is good enough for most uses, so users don't need to roll their own every time. The problem with variable length inputs is that hashing them is a much more complex problem, often depending on characteristics of the data itself, to require the standard library to include such an algorithm, and thus the implementation is punted to the developer. For example, a hash algorithm that works great for ASCII strings might not work so great for data containing mostly zeros, and a good algorithm for the latter might give too many collisions for strings. (There are also speed tradeoffs; some hashing algorithms might work great for everything but be too slow.)
IIRC, an old, old hashing algorithm for ASCII strings is to simply multiply every character's ASCII value together. Needless to say, this is really fast and only works because there are no zeros.
So instead of using std::hash
, you're supposed to write your own hashing class with the same API (i.e. it must define size_t operator()(Key)
) and pass that class as the Hash
template parameter to hash-using templates like std::unordered_set
.

- 41,631
- 10
- 72
- 96
-
(IIRC, an old, old hashing algorithm for ASCII strings is to simply multiply every character's ASCII value together) ... That's a terrible solution, as every character with an even value will shift another zero into the LSB, eventually resulting in a hash of 0. To make this work requires MSBs be shifted back into LSBs. – Peter Fletcher May 06 '22 at 20:31
-
I didn’t say it was good, and you’re right. Maybe they shifted right a bit each time? (This was back when multiplies were in software half the time.) It’s as bad as literal checksums, where the bytes were just added together and two bit flips of the same bit in different bytes would cancel out. In any case it’s terrible and shouldn’t be used because there’re much better ways. – Mike DeSimone May 07 '22 at 21:37