Why did the language designers of Java preferred chaining over open addressing for most hash based structures except for some like ThreadLocal?

Question

I know the difference between Open Addressing and Chaining for resolving hash collisions . Most of the basic hash based data structures like HashSet,HashMap in Java primarily use chaining technique. I read that ThreadLocal actually uses a probing scheme . So I want to understand why is open addressing not so much used in Java ? I mean it would be difficult to delete records using that scheme , in the sense that you have to mark those cells with some special handling . However it seems like memory requirement will be low for open addressing scheme.

Edit : I just want to understand the possible major reason/reasons for this design decision . I do not want finer details . Also I would like to know why ThreadLocal uses the lesser common technique of open addressing . I guess the two answers can be related together . So I prefer to ask in the same question itself.

This question would be best asked to the designers of `HashMap`: Doug Lea, Josh Bloch, Arthur van Hoff and Neal Gafter. I doubt anyone here will be able to tell you what their exact reasoning behind the decision was. — Jeffrey, Aug 18 '12 at 14:48
@Jeffrey I am just looking for the intuition here behind the design decision . I don't want finer details . Just like why Java supported multiple interface implementation and only single inheritance ? — Geek, Aug 18 '12 at 14:57
Is there anything in the specification of `java.util.HashMap` that requires implementations to chain instead of double-hashing? — Mike Samuel, Aug 18 '12 at 15:07
@MikeSamuel It seems to me like it is more convenient at the cost of more memory but I am not sure and this is why I asked this question. But I do not know anything in HashMap specification that would not allow it to use double hashing . — Geek, Aug 18 '12 at 15:12

score 20 · Accepted Answer · answered Aug 18 '12 at 16:00

20

I am currently discussing memory-compact reimplementations of HashMap and HashSet with, among others, Doug Lea. This particular question hasn't come up, but here's my first thoughts on the question...

Chained hash tables degrade reasonably gracefully. Whether it's higher load factors or lots of hash collisions, chaining doesn't degrade nearly as quickly as open addressing can.
As you've said, remove is...not a pleasant operation on open-addressed tables. As a general rule, remove is the least common operation on hash tables, but there are applications for which it's more common, and bad performance would be noticed.
I also suspect -- though I don't have much data -- that implementing a "linked" open-addressed hash table would be noticeably more difficult. LinkedHashMap is written as a subclass of HashMap, and borrows most of the implementation details; it's somewhat easier to implement the linked list of entries when the entries are discrete objects -- and at that point, you're already most of the way to a chained implementation.
Nothing in the spec ties them to this implementation -- they're always free to mess around with it later.
The JDK collections libraries...don't make memory consumption an especially high priority. Memory is cheap. (You may or may not agree with this, but it's definitely a noticeable trend.)

answered Aug 18 '12 at 16:00

Louis Wasserman

191,574
25
345
413

great to hear about your current work profile . can ypu explain this a little bit " Whether it's higher load factors or lots of hash collisions, chaining doesn't degrade nearly as quickly as open addressing can. " I know what load factor are but unable to see how chaing dregrades gracefully but open addressing doesn't. – Geek Aug 18 '12 at 16:30
also do you know or have any intuition of why ThreadLocals use open addressing technique and not the good old chaining technique ? – Geek Aug 18 '12 at 16:32
@Geek: if the load factor is, say, 0.95, then when you search for an absent key in an open-addressed hash table, on average you have to traverse 20 hash table positions before you can be sure that there's no entry associated with that key; for a chained hash table you need to traverse ~1 entry on average. With regards to `ThreadLocal`, I suspect the advantage is that it's easier to "notice," and expunge, GC'd entries -- visiting more or less arbitrary entries is more common in open-addressed implementations, so you'll notice that GC happened. Eh. – Louis Wasserman Aug 18 '12 at 16:57
3

@LouisWasserman: Things are much worse than that. Consider a 1000-slot table in which 500 items map without collisions to the odd numbers, and 100 items map to zero. The load factor is only 0.6, but any not-found item whose hash value is in the range 0 to 199 will have to scan every item from that value up to 199. There will be a one-in-five chance of having a hash value in such a range, and hitting such a hash values will require scanning an average of 100 items. Thus, one ends up having to scan an average of 20 items even with a load factor of only 60%. – supercat Feb 18 '14 at 22:24

Why did the language designers of Java preferred chaining over open addressing for most hash based structures except for some like ThreadLocal?

1 Answers1

Linked