12

Question

How is the HashMap method putIfAbsent able to perform a put conditionally in a way thats faster than calling containsKey(x) prior?

For example, if you didn't use putIfAbsent you could use:

 if(!map.containsKey(x)){ 
   map.put(x,someValue); 
}

I had previously thought putIfAbsent was convenience method for calling containsKey followed by a put on a HashMap. But after running a benchmark putIfAbsent is significantly faster than using containsKey followed by Put. I looked at the java.util source code to try and see how this is possible but it's a bit too cryptic for me to figure out. Does anyone know internally how putIfAbsent seems to work in a better time complexity? Thats my assumption based on running a few code tests in which my code ran 50% faster when using putIfAbsent. It seems to avoid calling a get() but how?

Example

if(!map.containsKey(x)){
     map.put(x,someValue);
}

VS

map.putIfAbsent(x,somevalue)

Java Source Code for Hashmap.putIfAbsent

@Override
public V putIfAbsent(K key, V value) {
    return putVal(hash(key), key, value, true, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}
Usman Mutawakil
  • 4,993
  • 9
  • 43
  • 80
  • 2
    I imagine it is because `putIfAbsent` only has to figure out where the key should fit once, whereas `containsKey` has to figure it out and so does `put` (therefore twice the calculations). – Jason Sep 26 '18 at 06:11

2 Answers2

15

The HashMap implementation of putIfAbsent searches for the key just once, and if it doesn't find the key, it puts the value in the relevant bin (which was already located). That's what putVal does.

On the other hand, using map.containsKey(x) followed by map.put(x,someValue) performs two lookups for the key in the Map, which takes more time.

Note that put also calls putVal (put calls putVal(hash(key), key, value, false, true) while putIfAbsent calls putVal(hash(key), key, value, true, true)), so putIfAbsent has the same performance as calling just put, which is faster than calling both containsKey and put.

Eran
  • 387,369
  • 54
  • 702
  • 768
  • Shouldn't put only need to call hashCode on the key and then insert into the corresponding bucket? I wouldn't think their would be a lookup associated with a put. – Usman Mutawakil Sep 26 '18 at 06:16
  • For example, one lookup for contains and then an O(1) insert for put. My understanding was put was simply an insert based upon hashcode of the key into the corresponding bucket. – Usman Mutawakil Sep 26 '18 at 06:18
  • Your update clarifies it a bit more. This will take some time to digest. – Usman Mutawakil Sep 26 '18 at 06:18
  • @UsmanMutawakil calling `hashCode` on the key is part of the lookup, but not the only part - the bucket may contain multiple entries, so you have to compare all of them to the key. Even though the expected running time of the lookup is `O(1)`, doing some constant time work twice still takes twice the time of doing it just one time. – Eran Sep 26 '18 at 06:19
  • I get that part, .equals comparison on all bucket entries, but didn't realize put was using the same core helper method putVal. I'll review and follow up. How have you come to know this? – Usman Mutawakil Sep 26 '18 at 06:24
  • @UsmanMutawakil I just looked at the code. Besides, it makes sense for `put` and `putIfAbsent` to use the same code, since they have almost the same functionality. – Eran Sep 26 '18 at 06:25
1

See Eran's answer... I'd like to also answer it more succinctly. put and putIfAbsent both use the same helper method putVal. But clients using put can't take advantage of its many parameters that allow put-if-present behavior. The public method putIfAbsent exposes this. So using putIfAbsent has the same underlying time complexity as the put you are already going to use in conjunction with containsKey. The use of containsKey then becomes a waste.

So the core of this is that private function putVal is being used by both put and putIfAbsent.

Usman Mutawakil
  • 4,993
  • 9
  • 43
  • 80