0

Gist

In Effective Java (Third Edition), the author provided a performance tip in relation to ConcurrentHashMap---i.e., when using existingValue = map.putIfAbsent(key, value), consider first calling existingValue = map.get(key) and skipping putIfAbsent() if the key already exists.

Question

Are performance considerations like that mentioned by the author documented anywhere?

I believe it is a sufficiently important / fundamental performance consideration to be documented somewhere official, especially since putIfAbsent() already returns the value if the key already exists, making the additional get() seems redundant, and it is not unreasonable that someone unaware of the performance consideration may "refactor" away the get() check.

Edited to clarify question: I'm not asking why or whether putIfAbsent() is indeed always more performant, but that whether such performance considerations are documented somewhere, given that the API has been designed such that on a plain interpretation of putIfAbsent() and get() being used together as suggested by the author without knowledge of the performance consideration, the get() seems redundant.

Details

The specific example provided in the book is as follows:

  • assuming that we are interested in implementing a method akin to String.intern(), which retrieves the value of a particular key on the map, and optionally inserting the key-value into the map if it does not already exists,
  • then the more performant approach is not to use the provided previousValue = map.putIfAbsent(key, value) directly, but to have an additional previousValue = map.get(key) check before.

For example, the code immediately below is less efficient:

// Concurrent canonicalizing map atop ConcurrentMap - not optimal
private static final ConcurrentMap<String, String> map = new ConcurrentHashMap<>();
public static String intern(String s) {
    String previousValue = map.putIfAbsent(s, s);
    return previousValue == null ? s : previousValue;
}

Whereas this code fragment below is more efficient:

// Concurrent canonicalizing map atop ConcurrentMap - faster!
public static String intern(String s) {
    String result = map.get(s);
    if (result == null) {
        result = map.putIfAbsent(s, s);
        if (result == null) result = s;
    }
    return result;
}

The reason provided by the author is that the get() method is more optimized than putIfAbsent(), which I interpret to mean that it is generally worthwhile to add additional get() checks to occasionally avoid the putIfAbsent() call.

I would also assume that the actual performance impact depends on the relatively frequency of insertion of new keys.

yongjieyongjie
  • 813
  • 10
  • 18
  • Seems you entirely missed *why* they suggested that, and that there are other methods available for doing it better, starting with Java 8 (`computeIfAbsent()`). See duplicate link added up top. --- *Hint:* It's not that `get()` is more optimized, it's that `putIfAbsent()` potentially requires the creation of the value object to put into the map, even if the key is already there, wasting the creation of that new value object. If providing that value object is anything more than trivial, you shouldn't use `putIfAbsent()`. – Andreas Jul 20 '20 at 02:58
  • @Andreas Could you point exactly to the author's code fragment where is the object creation? I quoted the example from the book exactly, and there is no object creation that I see is avoided. In fact, the author did not mention anything about object creation being slow. He emphasized efficiency of `get()`. Here's what he said verbatim: `ConcurrentHashMap is **optimized for retrieval operations, such as get**. Therefore, it is worth invoking get initially and calling putIfAbsent only if get indicates that it is necessary` – yongjieyongjie Jul 20 '20 at 19:01
  • 1
    Ok, I'm sorry, `putIfAbsent()` does take about twice as long as a `get()`, when the key already exists, at least for my tests on OpenJDK 14, but that may change at any time. and is very implementation specific, so I don't believe performance metrics are officially documented anywhere. --- Whether to use `get()` before `putIfAbsent()` also depends on the hit-ratio you're expecting, and whether the performance difference is something you'd even notice. – Andreas Jul 20 '20 at 19:22

0 Answers0