1

I was going through the source code of HashMap, but the binary operators confuses a lot.

I do understand the general purpose of below, fair distribution and bring hashCode within the bucket limit.

Can someone explain the comments here and what is the benefit of doing the way it is done right now?

/**
     * Computes key.hashCode() and spreads (XORs) higher bits of hash
     * to lower.  Because the table uses power-of-two masking, sets of
     * hashes that vary only in bits above the current mask will
     * always collide. (Among known examples are sets of Float keys
     * holding consecutive whole numbers in small tables.)  So we
     * apply a transform that spreads the impact of higher bits
     * downward. There is a tradeoff between speed, utility, and
     * quality of bit-spreading. Because many common sets of hashes
     * are already reasonably distributed (so don't benefit from
     * spreading), and because we use trees to handle large sets of
     * collisions in bins, we just XOR some shifted bits in the
     * cheapest possible way to reduce systematic lossage, as well as
     * to incorporate impact of the highest bits that would otherwise
     * never be used in index calculations because of table bounds.
     */
    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

It would be a really big help if someone could help me understand it.

This is not a duplicate because the other questions are related to hash implementations before Java 8.

Thanks in advance

Deepak Kumar
  • 843
  • 1
  • 7
  • 19
  • Which part specifically are you having trouble understanding? – NPE Oct 12 '19 at 20:10
  • Not that comfortable with bit wise operators. I mean I know how what do they do, like what is a left shift and what is right shift and what is XOR but then overall how is it helping to achieve the desired purpose of the hash and the comment does not help at all. "Higher bits of hash to lower", "sets of hashes that vary only in bits above the current mask will always collide" , "So we apply a transform that spreads the impact of higher bits downward" – Deepak Kumar Oct 12 '19 at 20:16

1 Answers1

5

hashCode() returns an int, which is 32 bits wide.

Internally, HashMap keeps the objects in pow(2, n) buckets or bins. The value of n can vary — the details are not important here; what is important is that n is typically much smaller than 32 (the number of bits in the hash).

Each object is placed into one of the buckets. To achieve good performance, it is desirable to spread the objects evenly across the buckets. This is where object hashes come in: The simplest way to choose a bucket would be by taking the lowest n bits of the object's hash code (using a simple bitwise AND). However, this would only use the lowest n bits and ignore the rest of the hash.

In the comments, the authors make an argument that this is undesirable. They cite examples of known use cases in which object hashes would systematically differ in bits other than the lowest n. This would lead to systematic collisions and systematic collisions are bad news.

To partially address this, they've implemented the current heuristic that:

  • keeps the top 16 bits of the hash as they are;
  • replaces the bottom 16 bits with an XOR of the top 16 and the bottom 16 bits.
NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • This post additionally helped me answer all my follow up questions.. https://coderanch.com/t/718659/java/HashMap-usage-Binary-Operators – Deepak Kumar Oct 13 '19 at 21:34