2

The hash structures I am aware of - HashTable, HashSet & HashMap.

Do they all use the bucket structure - ie when two hashcodes are similar exactly the same one element does not overwrite the other, instead they are placed in the same bucket associated with that hashcode?

Peter Recore
  • 14,037
  • 4
  • 42
  • 62
ST.
  • 315
  • 1
  • 4
  • 6
  • I think maybe your question is unclear. Are you trying to ask about hash collisions when you say "hashcodes are similar?" What do you mean by one element overwriting the other? – Peter Recore Nov 24 '09 at 18:24
  • when two elements return the same int from Object.hashcode() By overwriting I mean if the hashcodes were equal would the object being stored in the Hash structure be overwritten with the new object (with which it had a hash collision) – ST. Nov 24 '09 at 19:04

7 Answers7

8

In Sun's current implementation of the Java library, IdentityHashMap and the internal implementation in ThreadLocal use probing structures.

The general problem with probing hash tables in Java is that hashCode and equals may be relatively expensive. Therefore you want to cache the hash value. You can't have an array that mixes references and primitives, so you'd need to do something relatively complicated. On the other hand, if you are using == to check matches, then you can check many references without a performance problem.

IIRC, Azul had a fast concurrent quadratic probing hash map.

Tom Hawtin - tackline
  • 145,806
  • 30
  • 211
  • 305
  • probably a great answer but over my head slightly - what is meant by probing structures? You mean when comparing hashes? – ST. Nov 24 '09 at 19:11
  • Proving is a synonym for open addressing give in Adam Rosenfield's answer. – Tom Hawtin - tackline Nov 25 '09 at 00:43
  • @TomHawtin-tackline I have a basic [question on probing](http://stackoverflow.com/questions/12005405/are-open-addressing-in-hash-tables-only-useful-for-searching-how-do-the-elemen) here . Ca you please try to answer that ? – Geek Aug 18 '12 at 14:07
6

A linked list is used at each bucket for dealing with hash collisions. Note that the java HashSet is actually implemented by a HashMap underneath (all keys being mapped to the same singleton value across all HashSets) and hence uses the same bucket structure.

If an element is added, its equality is checked against all items in the linked list (via .equals) before it is added at the end. Hence having hash collisions is particularly bad, as this could be an expensive check as the linked list becomes larger.

oxbow_lakes
  • 133,303
  • 56
  • 317
  • 449
5

I believe Java hash structures all use a form of chaining to deal with colisions when performing the hashing - which places the items that have the same hash into a list.

I do not believe that Java uses open addressing for it's hash based data structures (open addressing recomputes hashes based on retry sequences until it finds an open slit in the table)

James Conigliaro
  • 3,809
  • 19
  • 22
  • 1
    For those of you who say, "I believe" is not good enough... rest assured. The above statement is true. I confirmed with docjar.com. That is, HashMap uses chaining (a custom-built linked list, not java.util.LinkedList). It does not use open addressing. – BMiner Apr 04 '11 at 19:58
4

No -- open addressing is an alternate method of representing hash tables, where objects are stored directly in the table, instead of residing in a linked list. Only one object can be stored at a given index, so resolving collisions is more complicated.

When adding an object for which another object already resides at the same index, a probing sequence is used to determine the new index at which to store the new object. Removing objects is also more complicated, since you if you remove an object, you need to leave a marker that says "there used to be an object here"; for more details, see Wikipedia.

Open addressing is preferable when the objects being stored as small and will rarely be deleted. Open addressing has improved cache performance, since you don't need to go through an extra level of indirection walking a linked list.

The classes you mentioned -- HashTable, HashSet, and HashMap don't use open addressing, but you could easily create new classes that implemented open addressing and provided the same APIs as those classes.

Adam Rosenfield
  • 390,455
  • 97
  • 512
  • 589
2

The apis define the behaviour, the internals of how Hash collisions are managed doesn't affect the guarantees of the API ... the performance impact of bad hash value computation is another story. Let's just hash everything to 42 and see how it behaves.

djna
  • 54,992
  • 14
  • 74
  • 117
0

Maps and Sets are the interfaces that determine the behavior of a HashSet or HashMap. A HashSet is a Set, and so it behaves like a Set (ie duplicates are not allowed). A HashMap acts like a Map - it will not overwrite a key with a similar hashcode, but it will overwrite a key, if the same exact key is used again. This will be the same regardless of what data structure is backing the Map internally. See the javadoc for Sets and HashMaps for more.

Did you mean to ask something about the specific implementation of one of these structures?

Peter Recore
  • 14,037
  • 4
  • 42
  • 62
  • so if I have two objects that are not identical (equals() fails) but return the same int from hashcode(), when both objects are placed in the same HashSet will one overwrite the other? – ST. Nov 24 '09 at 19:00
  • yes, as stated in the javadoc for Set, (http://java.sun.com/javase/7/docs/api/java/util/Set.html#add(E)) the ultimate definition of sameness is equals(). Quote: "More formally, adds the specified element e to this set if the set contains no element e2 such that (e==null ? e2==null : e.equals(e2)). " – Peter Recore Nov 24 '09 at 20:23
-3

Except the HashSet. Set is by definition unique elements.

This was a mistake. Please see the comments below.

coolest_head
  • 1,388
  • 1
  • 7
  • 6
  • -1: Different elements may have same Hash - they will both be added – djna Nov 24 '09 at 17:46
  • That doesn't preclude hash code collisions. A set still has to deal with them for non-equal objects. Anyway, a HashSet is just a wrapper to a HashMap in implementation. – Yishai Nov 24 '09 at 17:48
  • The question wasn't about duplicate elements. It was about elements that have the same hash elements. A hashset can of course contain multiple objects that have the same hash code. – sepp2k Nov 24 '09 at 17:48
  • I misread the question that it was about the equal elements rather than the elements with same hashcode. My apologies. I'll delete the post now. – coolest_head Nov 24 '09 at 19:39