Kotlin concurrency for ConcurrentHashMap

Question

I am trying to support concurrency on a hashmap that gets periodically cleared. I have a cache that stores data for a period of time. After every 5 minutes, the data in this cache is sent to the server. Once I flush, I want to clear the cache. The problem is when I am flushing, data could potentially be written to this map while I am doing that with an existing key. How would I go about making this process thread safe?

data class A(val a: AtomicLong, val b: AtomicLong) {
   fun changeA() {
      a.incrementAndGet()
   }
}

class Flusher {
   private val cache: Map<String, A> = ConcurrentHashMap()
   private val lock = Any()
   fun retrieveA(key: String){
       synchronized(lock) {
          return cache.getOrPut(key) { A(key, 1) }
       }
   }
 
   fun flush() {
      synchronized(lock) {
           // send data to network request
           cache.clear()
      }
   }
}

// Existence of multiple classes like CacheChanger
class CacheChanger{
  fun incrementData(){
      flusher.retrieveA("x").changeA()
  }
}

I am worried that the above cache is not properly synchronized. Are there better/right ways to lock this cache so that I don't lose out on data? Should I create a deepcopy of cache and clear it?

Since the above data could be being changed by another changer, could that not lead to problems?

Are there any other functions besides retrieve and flush which modify the map? Both of these functions synchronize on the same lock, so what is the problem that you are afraid of? — ciamej, Jun 21 '20 at 21:30
Besides, why do you use a ConcurrentHashMap if all your accesses are synchronized? — ciamej, Jun 21 '20 at 21:31
`ConcurrentHashMap` is thread safe itself. Also extension method `getOrPut` seems to be thread safe (based on the docs). If there are not any other methods that modify the map in a not thread safe way - you can get rid of this lock. — Michał Krzywański, Jun 21 '20 at 21:31
The problem is class A's value can be changed. What if the value of class A gets changed and I clear it. I'll update the example. — LateNightDev, Jun 21 '20 at 21:33
@michalik The OP is not safe to get rid of the lock, because flush needs to be atomic - the entire map needs to be read and then cleared, and no writes can be interleaved with this process. — ciamej, Jun 21 '20 at 21:35
You may have to make `cache`'s type `ConcurrentMap` to get the proper concurrency behavior. — Louis Wasserman, Jun 21 '20 at 22:13

ciamej · Answer 1 · 2020-06-21T22:04:01.987

You can get rid of the lock.

In the flush method, instead of reading the entire map (e.g. through an iterator) and then clearing it, remove each element one by one.

I'm not sure if you can use iterator's remove method (I'll check that in a moment), but you can take the keyset iterate over it, and for each key invoke cache.remove() - this will give you the value stored and remove it from the cache atomically.

The tricky part is how to make sure that the object of class A won't be modified just prior sending over network... You can do it as follows:

When you get the some x through retrieveA and modify the object, you need to make sure it is still in the cache. Simply invoke retrieve one more time. If you get exactly the same object it's fine. If it's different, then it means that object was removed and sent over network, but you don't know if the modification was also sent, or the state of the object prior to the modification was sent. Still, I think in your case, you can simply repeat the whole process (apply change and check if objects are the same). But it depends on the specifics of your application.

If you don't want to increment twice, then when sending the data over network, you'll have to read the content of the counter a, store it in some local variable and decrease a by that amount (usually it will get zero). Then in the CacheChanger, when you get a different object from the second retrieve, you can check if the value is zero (your modification was taken into account), or non-zero which means your modification came just a fraction of second to late, and you'll have to repeat the process.

You could also replace incrementAndGet with compareAndSwap, but this could yield slightly worse performance. In this approach, instead of incrementing, you try to swap a value that is greater by one. And before sending over network you try to swap the value to -1 to denote the value as invalid. If the second swap fails it means that someone has changed the value concurrently, you need to check it one more time in order to send the freshest value over network, and you repeat the process in a loop (breaking the loop only once the swap to -1 succeeds). In the case of swap to greater by one, you also repeat the process in a loop until the swap succeeds. If it fails, it either means that somebody else swapped to some greater value, or the Flusher swapped to -1. In the latter case you know that you have to call retrieveA one more time to get a new object.

Why would you need to iterate over all the entries and mimic the behaviour of the `ConcurrentHashMap::clear` method when it is already provided and guarantees to be thread safe along other methods in `ConcurrentHashMap` ? You are basically giving away the goodness of concurenthashmap in favour of external synchronization. — Michał Krzywański, Jun 21 '20 at 21:44
@michalik Because clear does not return the contents of the map. You want to do it atomically, get the contents and remove them at the same time. — ciamej, Jun 21 '20 at 21:49
Right, I missed that comment about sending the data over network. In this case your solution will work, but OPs solution(with lock) will work too (as far as atomicity of those 2 operations needs to be preserved) but he will also need to hold a lock when modifying the values (for example by exposing a method in `Flusher` for modifying). Also the solution that you are describing (seems like a type of CAS algorithm) might require `CacheChanger` to do the CAS check in case of data changing. But overall it really depends on the specifics of the application. — Michał Krzywański, Jun 21 '20 at 22:03
@michalk yes, the alternative for the OP would be to retain the lock, change the map to a regular one (no need for a ConcurrentHashMap), but apply all the modifications during the method retrieve while still keeping the lock. — ciamej, Jun 21 '20 at 22:06
@michalk you can post that as answer if you like, because it's a completely valid approach, but based on locks (therefore even simpler to reason about). — ciamej, Jun 21 '20 at 22:07
@michalk ok nevermind my last comment, I've already did it myself — ciamej, Jun 21 '20 at 22:32

score 0 · Answer 2 · edited Jun 22 '20 at 04:42

The easiest solution (but with a worse performance) is to rely completely on locks.

You can change ConcurrentHashMap to a regular HashMap.

Then you have to apply all your changes directly in the function retrieve:

fun retrieveA(key: String, mod: (A) -> Unit): A {
    synchronized(lock) {
        val obj: A = cache.getOrPut(key) { A(key, 1) }
        mod(obj)
        cache.put(obj)
        return obj
    }
}

~~I hope it compiles (I'm not an expert on Kotlin).~~

Then you use it like:

class CacheChanger {
    fun incrementData() {
        flusher.retrieveA("x") { it.changeA() }
    }
}

Ok, I admit this code is not really Kotlin ;) you should use a Kotlin lambda instead of the Consumer interface. It's been some time since I played a bit with Kotlin. If someone could fix it I'd be very grateful.

Kotlin concurrency for ConcurrentHashMap

2 Answers2