13

java.util.HashMap has an implementation of the put method, which has the following code inside it :

if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
    V oldValue = e.value;
    e.value = value;
    e.recordAccess(this);
    return oldValue;
}

In the above code why wasn't the reference check made first (since two objects having the same reference will have the same hash and equals()) ?

i.e. something like this :

if ((k = e.key) == key) {
    V oldValue = e.value;
    e.value = value;
    e.recordAccess(this);
    return oldValue;
} else if ( compare hash and equals) {
    // do something again with the value
}

Wouldn't this have saved a comparision?

Martijn Courteaux
  • 67,591
  • 47
  • 198
  • 287
Ace McCloud
  • 701
  • 1
  • 5
  • 16
  • 3
    This would save a comparison **in some cases**. – Denys Séguret Jun 16 '14 at 11:53
  • 1
    Yes, I agree, it would save a comparison only in some cases. – Ace McCloud Jun 16 '14 at 11:54
  • 2
    Suggest your fix to Oracle. Probably they will apply it to java 9. – AlexR Jun 16 '14 at 11:58
  • Did you profile your version before asking ? You should. Be careful to test with various types of maps and keys. You'll probably see no gain, the other == test which is made won't be noticeable. – Denys Séguret Jun 16 '14 at 11:58
  • I don't like this micro-optimization with `k` at all. – PeterMmm Jun 16 '14 at 11:59
  • 7
    Your suggestion is only beneficial if it's faster on the average. *Is* it faster on the average? – user207421 Jun 16 '14 at 12:03
  • 1
    `if (k = e.key) == key) {` should be `if ((k = e.key) == key) {` – avgvstvs Jun 16 '14 at 12:05
  • I haven't profiled this code with various types of maps and keys. Also, I don't know it it's faster on the average. – Ace McCloud Jun 16 '14 at 12:15
  • am i getting you correctly, you propose to have a check `if ((k = e.key) == key) || ((e.hash == hash && ((k = e.key) == key || key.equals(k))))) { do sth}`? if yes, then it will work quicker when key==e.key, otherwise you will have additional comparsion – user902383 Jun 16 '14 at 12:24
  • Because the increase in performance (if any) is not worth the decrease in readability and maintainability. Also, with your version you'd have two branches that run duplicate code. Even though the duplicate code is right next to each other, someone updating the code in one branch may fail to update the code in the other. I've seen it more times than I'd like. – Alvin Thompson Jun 16 '14 at 12:45
  • Your suggestion is problematic, since you moved the side effect `(k = e.key)` in front, which means that now `k` will get the `e.key` value whether `e.hash == hash` or not. In other words, you changed the semantics. If done exactly as you did, this will create a bug since the second comparison you suggest as `if ( compare hash and equals` will return the wrong result. – ethanfar Jun 16 '14 at 12:47
  • @eitanfar: Normally yes, but the `k` variable is for some reason only used in the if statement. I don't know why they use a temporary assignment instead of leaving out the `k` variable and write the last condition as `key.equals(e.key)`. – jarnbjo Jun 16 '14 at 13:04

2 Answers2

1

I don't know why, but a naïve microbenchmark suggests that on Oracle's VM (Intel, 32 or 64 bit), comparing two references takes about four times as long as comparing two ints (as in hash codes). I would have assumed that comparing two 32-bit integers and two address pointers should have had similar runtime cost on modern hardware, but I am probably just not considering something obvious here.

Assuming that different keys in most cases have different hash codes, comparing the hash before the key saves 75% runtime for each incorrect key and adds 25% runtime for the correct key. If this actually saves overall runtime depends of course on the exact content and layout of the hash map tables, but the Sun engineers obviously thought that this variant is better for most purposes.

Methods used for benchmarking:

public static int c1(int a, int b, int iter) {
    int r = 0;
    while((iter--)>0) {
        if(a == b) {
            r++;
        }
    }
    return r;
}

public static int c2(Object a, Object b, int iter) {
    int r = 0;
    while((iter--)>0) {
        if(a == b) {
            r++;
        }
    }
    return r;
}
jarnbjo
  • 33,923
  • 7
  • 70
  • 94
  • Show us the benchmark please. Looking at your rep, you should be able to write a valid benchmark, but just to be sure, you know :) – Martijn Courteaux Jun 16 '14 at 13:26
  • 2
    Please show more code. I think this piece of code is already highly suspicious when it comes to auto optimizing this code. A benchmark is very very sensitive to all sorts of minor things your forgot to do. For example: wrong usage of these methods might cause your methods to do simply nothing. And I mean, literally nothing. The compiler might simply optimize your whole method call away. I'm not sure if the JIT will optimize away the comparisons after passing a threshold. Can anyone confirm this? – Martijn Courteaux Jun 16 '14 at 13:48
  • @Martijn: It is of course a possibility that the optimizer at some point decides that the condition (a == b) never changes, but as long as the runtime ratio between c1 and c2 remains relatively constant independent of the number of iterations, I would at least first assume that the difference is caused by comparing ints instead of references. – jarnbjo Jun 16 '14 at 14:05
1

The operations if_icmpne (compare of integer) and if_acmpne (compare of reference) use the same technique to obtain result [1,2,3,4].

Both have prepared values on stack and consumes form it equally. There should not be significant difference in operations required. Both will be done in single CPU cycle.

In order to state that map can store the object in the same bucket there must be validate two rules.

  • Their hashcode must be equal
  • The must return true when x.equals(y)

IMHO the code reflect those rules and i could be written as

if (e.hash == hash && key.equals(k))

In order to satisfy map requirement we must always validate hashes and equals.

So for performance reason the part key.equals(k) was optimized with (k = e.key) == key giving

((k = e.key) == key || key.equals(k))

This implementation mean that for maps we valuate more hashes and equals as then reference equality. So this is expected behaviour.