1

We have a case like this.

class A{
 class foo{
    //Map with a lot of entries
    private HashMap<String,String> dataMap; 

    public updater(){
        // updates dataMap
        // takes several milliseconds
    }

    public someAction(){
        // needs to perform read on dataMap
        // several times, in a long process
        // which takes several milliseconds
    }
}

The issue is, both someAction and updater both can be called simultaneously, someAction is a more frequent method. If updater is called, it can replace a lot of values from dataMap. And we need consistency in readAction. If the method starts with old dataMap, then all reads should happen with old dataMap.

class foo{
    //Map with a lot of entries
    private HashMap<String,String> dataMap; 


    public updater(){
        var updateDataMap = clone(dataMap); // some way to clone data from map
        // updates updateDataMap instead of dataMap
        // takes several milliseconds
        this.dataMap = updateDataMap;       
    }

    public someAction(){
        var readDataMap = dataMap;
        // reads from readDataMap instead of dataMap
        // several times, in a long process
        // which takes several milliseconds
    }
}

Will this ensure consistency? I believe that the clone method will allocate a different area in memory and new references will happen from there. And are there going to be any performance impacts? And will the memory of oldDataMap be released after it has been used?

If this is the correct way, are there are any other efficient way to achieve the same?

  • 2
    You need to declare `dataMap` as `volatile`. Whether cloning the entire map on each update is better or worse than using a concurrent map, depends on the actual use case. – Holger Dec 04 '20 at 09:32
  • Memory that is no longer accessible will be (eventually) released. – tucuxi Dec 04 '20 at 09:50
  • Performance depends on several factors, including size of a deep copy of the map, the amount of free memory, and the time required to `update()` and `readAction()` your map. You should specify whether map consistency in your application means *`updater()` has run from start to end* or *looking at partial `updater()` results is ok, as long as each `set(key, value)` operation is atomic*. In the second case, using a `synchronizedMap(dataMap)` to wrap the map avoids the overhead of copying, and makes `volatile` no longer necessary. – tucuxi Dec 04 '20 at 09:58
  • @tucuxi looking for partial results is not okay. Flow can be something like this: `readAction` using old data > `updater` starts - `readAction` keep using the old data > as soon as `updater` has full map ready with new data, `readActions` to start using new data.. – Vaibhaw Vaibhaw Dec 04 '20 at 13:25
  • 1
    This code is not safe. Even if there is no mutable state inside the map, you have multiple threads accessing the `dataMap` variable with no coordination. There are many more things that can go wrong here than one might first assume. (Even with the "fix" of making `dataMap` volatile, and the proviso that the map is never modified once published, you are still at risk for lost updates.) – Brian Goetz Dec 04 '20 at 22:52
  • 1
    @BrianGoetz are you talking about the scenario of multiple threads calling `updater()` or something else? – Holger Dec 07 '20 at 09:13

2 Answers2

0

I believe your approach will work, because all changes from the updater() will occur to a (deep) copy, and will not be visible by someAction() until, in a single operation, the reference is updated.

I understand that you do not care about whether someAction() sees the latest version of the map's contents, as long as the map is consistent, that is, it is not observed while it is in the middle of being updated. In this case, there is no way for your someAction() to look at an incomplete map.

Beware that at most 1 thread should be able to call updater() - two threads calling it at the same time would mean that only one of them gets to write an updated map. I recommend the following changes:

// no synchronization needed at this level, but volatile is important
private volatile HashMap<String,String> dataMap = new HashMap<>;

// if two threads attempt to call this at once, one blocks until the other finishes
public synchronized updater() {
    var writeDataMap = clone(dataMap);  // a deep copy

    // update writeDataMap - guaranteed no other thread updating
    // ... long operation

    dataMap = writeDataMap;             // switch visible map with the updated one 
}

public someAction() {
    var readDataMap = dataMap;

    // process readDataMap - guaranteed not to change while being read
    // ... long operation
}

The important keyword here is volatile, to ensure that other threads have access to the updated map as soon as the updater() finishes its job. The use of synchronized simply prevents multiple updater() threads from interferring with each other, and is mostly defensive.

tucuxi
  • 17,561
  • 2
  • 43
  • 74
  • I didn't understand how making it volatile has any affect. Does JVM re-fetches data from original pointer, several times in one long operation which is stopped when we add volatile? – Vaibhaw Vaibhaw Dec 04 '20 at 13:28
  • 1
    @VaibhawVaibhaw without `volatile`, there is no guaranty that other threads ever notice the updated field, but even worse, other threads could see the updated field (the map reference) while seeing out-of-date values for the fields of the map itself. – Holger Dec 04 '20 at 15:33
  • I was thinking in a pointer way. A points to a location in memory, say MEM-1, B = A happens, implies B points to MEM-1 now, then A is associated with a new map, say on MEM-2 memory location, then B will keep using MEM-1, right? Why will B suddenly start pointing to MEM-2? Sorry, but I am not able to comprehend a scenario where this can happen @Holger – Vaibhaw Vaibhaw Dec 04 '20 at 15:41
  • 1
    @VaibhawVaibhaw I never said that B can “suddenly” use a new map. `someAction()` contains a read of `dataMap`. Without `volatile`, this read may or may not read a newer value written to it by another thread. There is no guaranty about which map reference it will read nor whether the writing thread has completed the constructor at this point. This is what *racy read* is about. Don’t try to reason with an order of operations, for a racy read, no order exists. – Holger Dec 04 '20 at 15:46
  • @Holger Okay, thanks! Can you please tell if my understanding is correct or not: `var a = new HashMap(); //then a updates`, till here old value of `dataMap` will be used. Then we do `dataMap=a;` this will immediately switch data map for all new threads/reads, and the old calculations will still keep using the old `dataMap` – Vaibhaw Vaibhaw Dec 04 '20 at 15:59
  • 2
    @VaibhawVaibhaw it will work that way *when* the variable `dataMap` is declared `volatile`… – Holger Dec 04 '20 at 16:01
0

If you want to avoid copies you could use a java.util.concurrent.locks.ReentrantReadWriteLock to protect access to your map. Use the write lock in the updater and the read lock in someAction.

daniel
  • 2,665
  • 1
  • 8
  • 18