8

I have the following piece of code from effective java by Joshua Bloch (Item 9, chapter 3, page 49)

If a class is immutable and the cost of computing the hash code is significant, you might consider caching the hash code in the object rather than recalculating it each time it is requested. If you believe that most objects of this type will be used as hash keys, then you should calculate the hash code when the instance is created. Otherwise, you might choose to lazily initialize it the first time hashCode is invoked (Item 71). It is not clear that our PhoneNumber class merits this treatment, but just to show you how it’s done:

    // Lazily initialized, cached hashCode
    private volatile int hashCode;  // (See Item 71)
    @Override public int hashCode() {
        int result = hashCode;
        if (result == 0) {
            result = 17;
            result = 31 * result + areaCode;
            result = 31 * result + prefix;
            result = 31 * result + lineNumber;
            hashCode = result;
        }
        return result;
    }

my question is how the caching (remembering the hashCode) works here. The very first time, hashCode() method is called, there is no hashCode to assign it to result. a brief explanation on how this caching works will be great. Thanks

Manos Nikolaidis
  • 21,608
  • 12
  • 74
  • 82
brain storm
  • 30,124
  • 69
  • 225
  • 393
  • Caching means saving the value you've calculated so you can re-use it without re-calculating it. That's all this is doing. – Sotirios Delimanolis Aug 27 '13 at 18:41
  • 1
    Eh? The cache is the "private volatile int hashCode". When the hash is calculated, it's saved to the cache. Initially the value is 0 as are all non-local numerical variables. – Kayaman Aug 27 '13 at 18:41

3 Answers3

12

Simple. Read my embedded comments below...

private volatile int hashCode;
//You keep a member field on the class, which represents the cached hashCode value

   @Override public int hashCode() {
       int result = hashCode;
       //if result == 0, the hashCode has not been computed yet, so compute it
       if (result == 0) {
           result = 17;
           result = 31 * result + areaCode;
           result = 31 * result + prefix;
           result = 31 * result + lineNumber;
           //remember the value you computed in the hashCode member field
           hashCode = result;
       }
       // when you return result, you've either just come from the body of the above
       // if statement, in which case you JUST calculated the value -- or -- you've
       // skipped the if statement in which case you've calculated it in a prior
       // invocation of hashCode, and you're returning the cached value.
       return result;
   }
Amir Afghani
  • 37,814
  • 16
  • 84
  • 124
  • Why use the modifier `volatile` ? Something related to caching ? If yes, please give a brief description. Thank you. – Charan Feb 09 '16 at 10:19
  • 7
    @Charan it's not necessary at all (and if you take a look at `java.lang.String` source, the `hash` field isn't `volatile`.The only drawback of not making it volatile is threads running on different CPUs might recalculate hashcode multiple times. But since Strings are immutable in java, that won't result in any inconsistency, just a possible performance penalty, which, I guess is okay since hash is being read way more often than it is calculated (and volatile reads might have significant overhead comparing to normal read). – karlicoss Apr 16 '16 at 16:02
2

The hashCode variable in an instance variable, and it's not initialized explicitly, so Java intializes it to 0 (JLS Section 4.12.5). The comparison result == 0 is in effect a check to see if result has been assigned a presumably non-zero hash code. If it hasn't been assigned yet, then it performs the calculation, else it just returns the previously computed hash code.

rgettman
  • 176,041
  • 30
  • 275
  • 357
-1

If you really wanted this to work right, you'd put another volatile variable boolean called isHashInvalid. Every setter involving values accessed in your hash function would set this variable. Then it becomes, (no need to test for '0' now):

private volatile int isHashInvalid=TRUE;
private volatile int hashCode; //Automatically zero but it doesn't matter

//You keep a member field on the class, which represents the cached hashCode value
@Override public int hashCode() {
    int result = hashCode;
    if (isHashInvalid) {
       result = 17;
       result = 31 * result + areaCode;
       result = 31 * result + prefix;
       result = 31 * result + lineNumber;
       //remember the value you computed in the hashCode member field
       hashCode = result;
       isHashInvalid=FALSE;
    }
    // when you return result, you've either just come from the body of the above
    // if statement, in which case you JUST calculated the value -- or -- you've
    // skipped the if statement in which case you've calculated it in a prior
    // invocation of hashCode, and you're returning the cached value.
    return result;
}
Dennis
  • 747
  • 7
  • 15
  • 1
    I'm not clear that adding an extra integer is worth the cost. The only problem with using zero is a sentinel would be if `hashCode` could return zero, and that problem could be solved by simply saying, after computing `result`, something like `if (result == 0) result = 8675309 + areacode;`. – supercat Nov 12 '13 at 02:51
  • 2
    It's either a 64 bit number or a 32 bit number The number of times it's actually going to be zero is LOW. So for that infinitesimal amount of items, the hash code will be calculated everytime. Not a big deal. – Dennis Nov 13 '13 at 03:14
  • Unless I'm misunderstanding, you're using the `isHashInvalid` for the purpose of allowing zero to be a valid hash value without requiring it to be rehashed. My point was that if one is worried about the risk of having to rehash some objects every time (which for arbitrary-sized objects, one *should* be, though perhaps not fixed-sized objects as in this example), one needn't use an extra flag to guard against that. – supercat Nov 13 '13 at 16:04
  • Even if one wants to allow incremental hashing (e.g. so appending data to a list which had already had a hash computed would simply have to compute the hash of the new items) using a 31-bit hash would seem better than using an extra flag (if the hash is good, false-hits should be rare even without the 32nd bit, and if it's not good, that 32nd bit isn't terribly likely to help) – supercat Nov 13 '13 at 16:13