What does Atomicity imply for cmputeIfAbsent in a concurrent hash map? Atomicity vs. Synchronized

Question

I know there have been many questions around computeIfAbsent.

Specifically what I am looking for is to understand the statement around atomicity for a concurrent hash map.

The entire method invocation is performed atomically, so the function is applied at most once per key.

If two threads attempt to execute computeIfAbsent with different key's and find that in both cases the map does not contain them, might the resulting executions of the compute if absent function be concurrent? I understand they would not be concurrent in the event that both threads were trying to add the SAME key.

The word Atomic is used and it is mentioned that this means applied at most once per key. But there isn't a specific mention of synchronized behaviour on the method.

As a side note, this is relevant to me in that the method called by computeIfAbsent modifies then uses a field of the class in its body.*

I want to understand if there is a threading concern resulting from two different thread executions of the computeIfAbsent method for the two different keys.

Essentially do I have to look at something along the lines of synchronizing access to the field variable and its subsequent use within the computeIfAbsent method I call.

*( The computeIfAbsent method invoked is the only method which modifies the field. There is no other invoker of the method outside of the call from the hash map computeIfAbsent method. There is only one instance of the concurrent hash map that calls the computeWithAbsent method that invokes the "atomic" method in question)

My field is volatile to avoid potential concerns with atomic visibility.

Diego · Answer 1 · 2022-08-04T01:13:34.797

TL;DR

Oversimplifying it, when two threads execute computeIfAbsent, just one of them will be successful. The second thread will be blocked until the first one ends. Once it is done, the second thread will now find the key and won't apply the mapping function.

Now going into detail:

computeIfAbsent is said to be atomic since it uses a mix between synchronization and compare-and-swap mechanisms to search and set values in the map. This ensures that two threads won't collide when setting a new key, and that is why the documentation ensures that this method will be executed "once at most".

If you take a quick look at computeIfAbsent source code in the JDK you will find the following:

synchronized (r) {
                if (casTabAt(tab, i, null, r)) {
                    binCount = 1;
                    Node<K,V> node = null;
                    try {
                        if ((val = mappingFunction.apply(key)) != null)
                            node = new Node<K,V>(h, key, val);
                    } finally {
                        setTabAt(tab, i, node);
                    }
                }
            }

That snippet will granularly block and try to atomically apply the mapping function.

If you want to dig even more on that and you have a good understanding of how HashMaps work and CAS, you can also take a look at a more detailed explanation of the ConcurrentHashMap code here: ConcurrentHashmap in JDK8 code explanation

Thank You. You write "Oversimplifying it, when two threads execute computeIfAbsent...Once it is done, the second thread will now find the key and won't apply the mapping function." My question was what if both threads provide DIFFERENT keys that both don't exist in the concurrent hash map. "If two threads attempt to execute computeIfAbsent with different key's and find that in both cases the map does not contain them, might the resulting executions of the compute if absent function be concurrent?" I'm sorry if I hadn't been clear. — Shawn, Aug 04 '22 at 02:54
looking at this particular implementation online... I think if perhaps that synchronization block will apply even if the keys are different as long as a new value needs to get created via the function call... is that your understanding? https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/concurrent/ConcurrentHashMap.java — Shawn, Aug 04 '22 at 03:45
@Shawn yep, exactly that. As DuncG explained, the inner storage is divided in buckets and different keys may or may not be executed concurrently. You also said "I would have to ensure thread-safe behaviour of my computeIfAbsent". Yes, but this shouldn't be an issue... the mapping function should be something simple and fast to avoid long blockings on the Map. So most variables should be of local scope and you should avoid non-thread-safe side-effects. Pretty much the same advice that is given when working with Streams. — Diego, Aug 04 '22 at 18:33

DuncG · Accepted Answer · 2022-08-04T13:59:45.787

There are situations where the mapping function could be executed concurrently for different key values so it is important that your mapping function is thread-safe.

The computeIfAbsent method only guarantees that the mapping function isn't called simultaneously for the same key value. Also note that a Map works by hashing muliple keys into buckets of entries and if computeIfAbsent(a, mapFunc) is called at same time as computeIfAbsent(b, mapFunc) with pair of keys a+b that map to same sub-table of ConcurrentHashMap, then mapFunc for each key will be run one after the other and not at same time.

However where the different keys do not resolve to same sub-table within ConcurrentHashMap you should expect your mapping function to be called simulaneously by different threads for different key values.

Here is an example which shows a thread-safe mapping function that detects concurrent callers:

public static void main(String[] args) throws InterruptedException {
    ConcurrentHashMap<String, String> map = new ConcurrentHashMap<>(2096, 1.0f);
    AtomicInteger concurrent = new AtomicInteger();
    Function<String, String> mappingFunction = s -> {
        int c = concurrent.incrementAndGet();
        String value = "Value:"+s +" concurrent="+c+" thread="+Thread.currentThread().getName();
        if (c != 1)
            System.out.println("Multiple callers for "+value);
        try { Thread.sleep(50); } catch (InterruptedException ignore) { }
        concurrent.decrementAndGet();
        return value;
    };
    Runnable task = () -> {
        Random r = new Random();
        for (int i = 0; i < 10_000; i++)
            map.computeIfAbsent(String.valueOf(r.nextInt(10240)), mappingFunction);
    };

    Thread a = new Thread(task, "one");

    a.start();
    task.run();
    a.join();
    map.values().stream().limit(32).forEach(System.out::println);
}

If run enough, there will be occasions where the counter inside mappingFunction shows that 2 instances are running at same time on the pair of threads.

EDIT

To answer your comment about synchronized (r):

Note that there is infinite while loop inside the computeIfAbsent which only exits on break or return, and mappingFunction.apply(key) is called in two places:

when the key is the first entry into the sub-table it runs to the synchronized (r) block. As the line before declares Node<K,V> r = new ReservationNode<K,V>() there is NEVER contention on r from different threads, but only one thread successfully enters the if (casTabAt(...)) { binCount = 1; ... } block and returns, other losing threads resume the loop.
when the key is not the first entry into the sub-table it runs to the synchronized (f) block which would block all but one threads trying to computeIfAbsent for different keys that are hashed to the same sub-table. As each thread enters the block it verifies f is unchanged, and if so returns existing or computed new value - otherwise resumes the loop.

This is precisely the worry I had had! That because the keys were different I would have to ensure thread-safe behaviour of my computeIfAbsent method. Out of interest in the code for computeIfAbsent for the open JDK in Github. "https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/concurrent/ConcurrentHashMap.java", could you point me at the lines where the sub-table comes into play to defeat the synchronization on "r". — Shawn, Aug 04 '22 at 12:38
Thanks @DuncG. I was focused on the synchronized (r) which was the red herring. — Shawn, Aug 04 '22 at 15:48

What does Atomicity imply for cmputeIfAbsent in a concurrent hash map? Atomicity vs. Synchronized

2 Answers2