Java 8 is providing alternative hashing for String keys to improve performance when a large number of key hash code collisions are encountered. Can anybody explain what is that and how it will work?
-
2+1 This could be used to avoid denial of service attacks. – Peter Lawrey Aug 14 '12 at 07:55
-
Thanks i will keep this in mind – Pramod Kumar Aug 14 '12 at 08:00
-
@PeterLawrey Are you referring to [this bug report](https://bugzilla.redhat.com/show_bug.cgi?id=750533) or something else when you mention DoS? – Andrzej Doyle Aug 14 '12 at 11:40
3 Answers
To bring more relevance to this question, the alternative hashing has been removed from JDK 8. Check out :
http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html
http://openjdk.java.net/jeps/180
It is interesting to note that once the number of items in a hash bucket grows beyond a certain threshold, that bucket will switch from using a linked list of entries to a balanced tree.
The hash(Object key) function in the HashMap has been revised to follows with no special treatment to String objects:
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

- 1,033
- 1
- 8
- 11
From this email of core-lib-devs@openjkd
:
- A new interface Hashable32 is introduced.
- Hashable32 provides a method hash32()
- String implements Hashable32 and hash32() method
- HashMap et al recognize String and invoke hash32() rather than hashCode()
The revisions of the code:
- Murmur3 : https://code.google.com/p/smhasher/wiki/MurmurHash3
- althashing "7" webrev : http://cr.openjdk.java.net/~mduigou/althashing7/8/webrev/
- althashing "8" webrev : http://cr.openjdk.java.net/~mduigou/althashing8/8/webrev/

- 38,045
- 5
- 92
- 123
-
From what I can tell, the biggest flaws with the old hashing algorithm were that it sometimes returned zero for long strings, and that specifying a particular implementation precluded the possibility of having the VM implement a "hash-string" function which was designed for optimal performance on that particular machine (e.g. a 64-bit machine might use a function which operates on groups of 8 bytes and then munges the result down to 32 bits). I wonder how much code really relies upon the exact values of the old string hash, and how hard it would be to allow a 'compatibility workaround'? – supercat Jan 26 '13 at 21:14
It should be noted that the shift to MurmurHash3 will not prevent DoS attacks: http://emboss.github.com/blog/2012/12/14/breaking-murmur-hash-flooding-dos-reloaded/

- 1,669
- 22
- 41