PS: Because many people in SO don't like discussing the motivation/trade-off of JDK implementation details, they think JDK engineers have a right to do it without telling anybody. (a previous post about JDK motivation has been closed), this question is purely about the HashMap algorithm & data structure and trade-off analysis/engineer consideration between two separate chaining implementations.
As we all know, we can use separate-chaining method to handle hash collision when implementing HashMap(every chain is a different linked list). In principle, when inserting a new element with hash collision, we can insert it into the head or tail of the linked list.
Both methods can work with the same worst-time complexity(since in both cases, we have to scan the whole linked list to check whether there is the same key, if not then we need to insert it. When we scan the whole linked list, we have had the head and tail.). However, when I learned the algorithm course, my teacher told us that we prefer to insert into the head since ,in general, more recently inserted elements have more chances to be looked up. For this reason, I've seen that all algorithm or data structure textbooks with pseudo-code or concrete implementation in any programming language choose to insert into the head. (e.g., Alogirhtms, Sedgewick code, Introduction to Algorithms, CLRS(page 258), etc.)
However, a few days ago, I saw the source code HashMap
in JDK8. JDK8 chooses to insert into the tail, which is out of my expectation based on my knowledge(the line 611, 641, and putVal()
method in JDK 8 source code). Then I checked JDK7 and found that JDK7 chooses to insert into the head as we usually learned. (line 402, line 766 and addEntry()
method in JDK 7 source code)
My question:
In general, what's the trade-off between insertion into the head and insertion into the tail when implementing separate-chaining HashMap? Is there any practical engineer consideration(e.g. multi-thread)? (I've seen several blogs talking about insertion into the head may cause a dead loop if not synchronized properly.)