1

Task is to keep track of some running processes. Keeping that information in memory is just fine, so I'm using a concurrent hash map to store that data:

ConcurrentHashMap<String, ProcessMetaData> RUNNING_PROCESSES = new ConcurrentHashMap();

It's all good and fine with safely putting new objects in the map, problem is that state of those processes change so I have to update ProcessMetaData from time to time. I made ProcessMetaData immutable and use ConcurrentHashMap's compute() method to update values, but now the problem is ProcessMetaData gets more complicated and keeping it immutable gets hardly manageable. The question is - as long as I only update ProcessMetaData in atomic (as per javadoc) compute() method - the object may be mutable and overall things will still be thread-safe? Is my assumption correct?

Yuri Dolzhenko
  • 242
  • 2
  • 7
  • 2
    Assuming that the reader of `ProcessMetaData` uses a compute then yes. Otherwise readers will observe races as it is modified underneath them (e.g. equivalent to using `synchronized (processMetaData) { ... }`). The immutable instance means that readers don't need to work under the exclusive lock, so if you remove that then you need to coordinate. – Ben Manes Jun 01 '21 at 21:51
  • yeah, it makes sense now, thanks a lot. – Yuri Dolzhenko Jun 01 '21 at 22:40
  • @BenManes does atomicity imply visibility? I mean the fact that `compute` is documented to be `atomic` means it also provides `visibility`? The implementation could in theory lock on different locks for reads and writes (made up example), so readers are not guaranteed to see the updates in such a case. – Eugene Jun 02 '21 at 02:02
  • @Eugene at the hardware level the concept is a [memory barrier](https://en.wikipedia.org/wiki/Memory_barrier) to enforce memory ordering. A reader needs a __load barrier__ to see changes (or may never observe them), and a writer needs a __store barrier__ to publish them (or may never be readable). Atomicity would be that all changes can be visible or not for the protected data (no [partial writes](https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.7)). Otherwise, readers may see only some changes and cause non-deterministic behavior. Allowing data races has to be designed in. – Ben Manes Jun 02 '21 at 03:34
  • @BenManes thank you, I get that. It's just that `compute` is not documented with any happens-before guarantees, neither is the class. The only guarantee comes from `java.util.concurrent` package with : "Actions in a thread prior to placing an object into any concurrent collection happen-before actions subsequent to the access or removal of that element from the collection in another thread", but that is still _not_ enough to say that `compute` offers the needed guarantees, with regards to `happens-before`. – Eugene Jun 02 '21 at 14:54
  • @Eugene The class javadoc does discuss happens-before, though. If you have suggestions, please email [concurrency-interest](http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest) as Doug is very welcoming to changes that add clarity. – Ben Manes Jun 02 '21 at 17:15
  • 2
    If the compute updates the ProcessMetaData instance, then there is a data race. If the compute creates a new ProcessMetaData instance based on the existing one (so it is effectively immutable), then there should not be a data race. – pveentjer Jun 03 '21 at 02:48
  • 1
    @Eugene reading from a `ConcurrentHashMap` doesn’t use a lock at all. Yes, that *is* different from what `compute` does and makes a `compute` operation that modifies an already stored object non-atomic in respect to retrieval operations. But even if you only use `compute`, you’d not be allowed to use its return value as that would not be part of the atomic operation. – Holger Jun 03 '21 at 13:31
  • @Holger yes, we had this debate before. Using `get` with `compute` is unsafe, that is why `get` says it might _overlap_. I get that. The question I had was `compute` for updates, `compute` for retrieval as Ben was suggestion earlier. – Eugene Jun 03 '21 at 14:49
  • 1
    @Eugene I don’t see the word “retrieval” in the previous discussion. The *function* will perceive the previous state for sure. Using `compute` for retrieval is a different issue, as, of course, it will perceive the completed state of the update it just performed within the same thread. The problems start, as explained in my answer, with other `compute` operations while using the retrieved value. There is no ordering and no memory visibility (how could it?) when using a retrieved value while other `compute` operations mutate it. – Holger Jun 03 '21 at 15:18
  • @pveentjer after 2 days thinking about this question, I think your comment sums up what OP is supposed to do. It is a great idea, pity you rarely transfer your comments into answers. – Eugene Jun 05 '21 at 00:18

1 Answers1

3

As long as you only access the value within the function passed to compute, modifications made in that function are safe.

This, however, is a pointless theoretical view. The purpose of storing values into a collection or map, is to eventually retrieve and use them. And this is where the problems start.

The compute method returns the result value just like get returns the currently stored value. Once a caller starts using that value, this use may be concurrent to subsequent compute operations on the map. The get method may even retrieve the value while a compute operation is in progress. Allowing non-blocking retrieval operation is one of ConcurrentHashMap’s main features. Therefore, all kind of race conditions may occur.

So, using a mutable object and modifying an already stored value in compute is only safe, when you use the map as write-only memory, which is a far-fetched scenario. It might work when you use a different thread safe mechanism to ensure that all updates have been completed before starting to read the map, but your use case seems to be different.

Holger
  • 285,553
  • 42
  • 434
  • 765