ConcurrentHashMap computeIfAbsent tell if first time or not

Question

It's complicated for me to articulate a proper title for this. But an example should make it far simpler. Suppose I have this:

final class Cache {
   private static final ConcurrentHashMap<String, List<String>> CACHE = ...

   static List<String> byName(String name) {
      return CACHE.computeIfAbsent(name, x -> // some expensive operation)
   }

}

The idea is probably trivial, this acts as a LoadingCache, much like guava or caffeine (in reality it is more complicated, but that is irrelevant to the question).

I would like to be able to tell if this was the first load into the CACHE, or it was a read of an existing mapping. Currently, I do this:

final class Cache {
   private static final ConcurrentHashMap<String, List<String>> CACHE = ...

   static List<String> byName(String name) {
      boolean b[] = new boolean[1];
      List<String> result = CACHE.computeIfAbsent(name, x -> {
            b[0] = true;
            // some expensive operation)
      });

      if(b[0]) {
         // first load into the cache, do X
      } else {
         // do Y
      }

      return result;
   }

}

This works, but I am afraid I am missing something that ConcurrentHashMap can offer for me that would allow me to do the same. Thank you.

Why not just put whatever is in the body of your if in the lambda? — tgdavies, Dec 25 '22 at 22:07
@tgdavies the chain of actions that has to be triggered based on that would be huge, unfortunately :( — Eugene, Dec 25 '22 at 22:10
So it's OK for other threads to use the result of the lambda before the if statement has executed? — tgdavies, Dec 25 '22 at 22:17
First time loaded or first time accessed? Because "first time loaded" is exactly when the lambda is executed — knittl, Dec 25 '22 at 22:23
@tgdavies yes. I only want to take an action on the first insert into the cache, I don't care if other threads use the existing already in the cache mapping, even before I execute the "if" statement. To simplify, let's say I want to be able to log: a) "Entry coming from Cache" b) "Entry inserted first time in the cache". — Eugene, Dec 25 '22 at 22:27
@knittl "loaded". how would I tell the difference then, between first time loaded vs != of that? — Eugene, Dec 25 '22 at 22:34
@Eugene the lambda is only executed if the key is not present in the map. If the value is already cached and returned from the cache, the lambda is not executed. Maybe I'm misunderstanding your question. — knittl, Dec 25 '22 at 22:36
@knittl I edited slightly, added an `else`. Does it make more sense now? — Eugene, Dec 25 '22 at 22:40
It is advised not to do expensive operations in lambda passed to `computeIfAbsent`, as it can block other threads ([see doc](https://docs.oracle.com/en/java/javase/19/docs/api/java.base/java/util/concurrent/ConcurrentHashMap.html#computeIfAbsent(K,java.util.function.Function))). — Jorn Vernee, Dec 25 '22 at 23:40
Caffeine uses this [approach](https://github.com/ben-manes/caffeine/blob/499b448f46b29da5637754ff495f6200f7bd1f93/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L2591) of a capturing lambda to update a single element array. It isn't beautiful but simple, obvious, and cheap. This is on the fallback path as typically an optimistic read would find the entry and it avoids the compute methods locking even when present. — Ben Manes, Dec 26 '22 at 07:22
@Eugene why not just `do Y` in the start of your method with `computeIfPresent`. Then follow with `do X` with `computeIfAbsent` ? — Panagiotis Bougioukos, Dec 29 '22 at 17:30

knittl · Accepted Answer · 2022-12-26T11:16:21.733

3

If you want to avoid your single-element array to pass data out of the lambda (which I would rather do with an AtomicReference or AtomicBoolean), you could use a stateful callback object. It doesn't change the behavior or design of your code, but could be considered a little bit cleaner and more OOP-y.

class LoadingAction<K, V> {
  private boolean called = false;

  public V load(final K key) {
    called = true;
    // load data
    return ...;
  }

  public void executePostLoad() {
    if (called) {
      // loaded into cache, do X
    } else {
      // do Y
    }
  }
}

final class Cache {
   private static final ConcurrentHashMap<String, List<String>> CACHE = new ConcurrentHashMap<>();

   static List<String> byName(String name) {
      final LoadingAction<String, List<String>> loader = new LoadingAction<>();
      final List<String> result = CACHE.computeIfAbsent(name, loader::load);

      loader.executePostLoad();

      return result;
   }

}

Or turn it inside-out:

class Loader<K, V> {
  private boolean called = false;

  public V load(final Map<K, V> map, final K key) {
    final V result = map.computeIfAbsent(key, this::load);
    this.executePostLoad();
    return result;
  }

  private V load(final K key) {
    called = true;
    // load data
    return ...;
  }

  private void executePostLoad() {
    if (called) {
      // loaded into cache, do X
    } else {
      // do Y
    }
  }
}

final class Cache {
   private static final ConcurrentHashMap<String, List<String>> CACHE = new ConcurrentHashMap<>();

   static List<String> byName(String name) {
      final Loader<String, List<String>> loader = new Loader<>();
      return loader.load(CACHE, name);
   }

}

Construction and loading could be encapsulated in a static method:

class Loader<K, V> {
  private boolean called = false;

  public static <K, V> V load(final Map<K, V> map, final K key) {
      final Loader<K, V> loader = new Loader<>();
      return loader.doLoad(map, key);
  }

  private V doLoad(final Map<K, V> map, final K key) {
    final V result = map.computeIfAbsent(key, this::load);
    this.executePostLoad();
    return result;
  }

  private V load(final K key) {
    called = true;
    // load data
    return ...;
  }

  private void executePostLoad() {
    if (called) {
      // loaded into cache, do X
    } else {
      // do Y
    }
  }
}

final class Cache {
   private static final ConcurrentHashMap<String, List<String>> CACHE = new ConcurrentHashMap<>();

   static List<String> byName(String name) {
      return Loader.load(CACHE, name);
   }

}

edited Dec 26 '22 at 11:16

answered Dec 26 '22 at 11:00

knittl

246,190
53
318
364

we kind of thought about close to this today with a few colleagues. Thank you for the time taken here. – Eugene Dec 26 '22 at 18:05
1

This subverts the entire thread safety of the `ConcurrentHashMap` as it allows to have X and (even multiple) Y being operating on the same object at the same time, concurrently. On the other hand, any attempt to fix this will subvert the atomicity advantage of the `ConcurrentHashMap`. – Holger Jan 02 '23 at 14:55
@Holger how come? doesn't each thread creates it's own `LoadingAction` and invoke a method that is documented as "performed atomically"? Knowing you, I am sure I miss something. – Eugene Jan 02 '23 at 15:38
@Eugene yes, each thread will have its own `LoadingAction`, but there's no synchronization between the method calls. Action "Y" could be executed from multiple threads at the same time (I don't think "X" can because `called` should only be true for a single thread). – knittl Jan 02 '23 at 15:47
@knittl but Im OK with that, Y can be executed from multiple threads at the same time, as long as X is executed only once – Eugene Jan 02 '23 at 16:13
@Eugene I don't think "X" can be executed multiple times nor concurrently. Perhaps Holger can shed some more light, it's possible that I missed something – knittl Jan 02 '23 at 16:16
2

X will only be executed once but while it is executed, other threads can already retrieve `V` from the map and execute Y and even return the `V` to the caller while X still is being executed. The word “multiple” in my previous comment only referred to Y while, without the bracketed phrase, it says “X and Y”, meaning there can be an X and a Y operating at the same time. – Holger Jan 03 '23 at 07:26
1

@Holger thanks for clarifying, I understand now. The value could be returned while the "post processing" step is still active. This might or might not be a problem, depending on OP's requirements. Nevertheless important to note, thanks! – knittl Jan 03 '23 at 07:40
@Holger thank you for the comment, and yes, I'm OK with such a thing business wise. – Eugene Jan 03 '23 at 07:47
1

@Eugene then, a simple `putIfAbsent` could also solve the issue (assuming that the construction of a `V` instance is cheap). – Holger Jan 03 '23 at 07:50
@Holger can you post it as an answer? would it be too much to ask for, considering you already spent quite some (valuable) time here? – Eugene Jan 03 '23 at 07:52
1

@Eugene in your question, `V` inconsistently changes between `String` and `List`. In the latter case, it would contradict the premise that the construction is cheap (or safely initialized) – Holger Jan 03 '23 at 07:56
@Holger I am not sure I follow, "cheap" in my case means around 100ms and it's a rpc call, and yes, you of course are right - it's a `List` there. Why would that not be safely initialized, I thought the "atomically" part in the `computeIfAbsent` would make it such. – Eugene Jan 03 '23 at 08:13
@Eugene to answer that usefully it would be important to know what the "post-process" step does. Does it need to be synchronized? Is it expensive? Can there be race conditions? Is it problematic if it is executed more than once? Is it a problem if the value is returned before the post-process step has completed? Could the post-process be submitted to an async thread pool? – knittl Jan 03 '23 at 08:16
@knittl I see, it does not have to be synchronized, no. I retrieve the `List` from the rpc call and this is where my interaction with the CACHE ends. I don't care if I return the result before post-process step completes - as a matter of fact you are correct, it is done async. The entire post-processor step is done async. I don't even care if Y step is executed (even returns the result to the callers) while post-process from X is currently executing. All I care about is to execute X, once, on the initial load of the value into the cache, without any other synchronization concerns. – Eugene Jan 03 '23 at 08:28
1

@Eugene in your question’s second code snippet, the comment “some expensive operation” still is inside the `computeIfAbsent`. It doesn’t say that “X” is the expensive operation. Constructing a `List` isn’t on the cheap side for me, but an RPC call is even less. That’s where my confusion came from. Using `putIfAbsent` only is an option if it is affordable to construct the actual value in advance and drop it if a value already exist. On the plus side, no synchronization will be needed then. – Holger Jan 03 '23 at 08:52
@Holger I was trying to simplify the example, and my simplifications only ruined it :( terribly sorry. I see your point with `putIfAbsent` and unfortunately that would not be an option, as the `List` is computed based on `name` – Eugene Jan 03 '23 at 09:30

Stephen C · Answer 2 · 2022-12-26T01:04:36.513

I can think of a couple of ways to do this using the ConcurrentHashMap API. Both reply on using a mapping function that has side-effects. The idea is that the function makes a record of whether it was called, or what arguments it was called with.

The spec for computeIfAbsent says that the mapping function is only called if the key is absent. Alternatively, the spec for compute says that the mapping function is called with a null argument if the key is argument. In either case, if you record what happened in the mapper function via a side-effect on (say) a field of the mapper function/object, you can determine if the cache entry was already present or not.

To make this thread-safe, you need to create a fresh (thread-confined) instance of the mapper function.

ConcurrentHashMap computeIfAbsent tell if first time or not

2 Answers2