11

Introduction
Suppose I have a ConcurrentHashMap singleton:

public class RecordsMapSingleton {

    private static final ConcurrentHashMap<String,Record> payments = new ConcurrentHashMap<>();

    public static ConcurrentHashMap<String, Record> getInstance() {
        return payments;
    }

}

Then I have three subsequent requests (all processed by different threads) from different sources.
The first service makes a request, that gets the singleton, creates Record instance, generates unique ID and places it into Map, then sends this ID to another service.
Then the second service makes another request, with that ID. It gets the singleton, finds Record instance and modifies it.
Finally (probably after half an hour) the second service makes another request, in order to modify Record further.

Problem
In some really rare cases, I'm experiencing heisenbug. In logs I can see, that first request successfully placed Record into Map, second request found it by ID and modified it, and then third request tried to find Record by ID, but found nothing (get() returned null).
The single thing that I found about ConcurrentHashMap guarantees, is:

Actions in a thread prior to placing an object into any concurrent collection happen-before actions subsequent to the access or removal of that element from the collection in another thread.

from here. If I got it right, it literally means, that get() could return any value that actually was sometime into Map, as far as it doesn't ruin happens-before relationship between actions in different threads.
In my case it applies like this: if third request doesn't care about what happened during processing of first and second, then it could read null from Map.

It doesn't suit me, because I really need to get from Map the latest actual Record.

What have I tried
So I started to think, how to form happens-before relationship between subsequent Map modifications; and came with idea. JLS says (in 17.4.4) that:

A write to a volatile variable v (§8.3.1.4) synchronizes-with all subsequent reads of v by any thread (where "subsequent" is defined according to the synchronization order).

So, let's suppose, I'll modify my singleton like this:

public class RecordsMapSingleton {

    private static final ConcurrentHashMap<String,Record> payments = new ConcurrentHashMap<>();
    private static volatile long revision = 0;

    public static ConcurrentHashMap<String, Record> getInstance() {
        return payments;
    }

    public static void incrementRevision() {
        revision++;
    }
    public static long getRevision() {
        return revision;
    }

}

Then, after each modification of Map or Record inside, I'll call incrementRevision() and before any read from Map I'll call getRevision().

Question
Due to nature of heisenbugs no amount of tests is enough to tell that this solution is correct. And I'm not an expert in concurrency, so couldn't verify it formally.

Can someone approve, that following this approach guarantees that I'm always going to get the latest actual value from ConcurrentHashMap? If this approach is incorrect or appears to be inefficient, could you recommend me something else?

mkrakhin
  • 3,386
  • 1
  • 21
  • 34
  • 1
    Maybe I have read your question wrong here but you seem to assume that the read will always happen after the write; why? – fge Apr 21 '15 at 12:09
  • @fge although they are independent threads. This is a predetermined sequence (put-get-get), that interests me and sometimes goes wrong. – mkrakhin Apr 21 '15 at 12:20
  • 1
    Well, it _would_ go wrong at one point or another anyway; you cannot guarantee the order of executions of threads unless you guard them against one another, so this looks pretty much like a lost cause; what you call the "latest value" is whatever value will have been written by the last writing thread, so it is consistent so far. You need some other synchronization mechanism as far as I see. – fge Apr 21 '15 at 12:23
  • @fge well, this second service sends me second request only after I send him ID. And sends third request only after I send him response to second request. I think this sequence is forced due to logic of flow. Second service cannot send second request until it receives from me request containing ID, and it cannot send third request until it receives response to the second. – mkrakhin Apr 21 '15 at 12:28
  • I am guessing in this sequence `T1: put(a); put(b)`, `T2: get(b); get(a)`. If `get(b)` return the expected value, `get(a)` would also always return the expected value. no? – Kelvin Ng Apr 21 '15 at 12:30
  • Are all your threads on the same ClassLoader? – Rich Apr 21 '15 at 12:31
  • @KelvinNg I also suppose so (in case of `put(b)` modifying some volatile field), but not sure. And I'm interested in cases where I don't know the expected value of `b`. – mkrakhin Apr 21 '15 at 12:33
  • @Rich in my case they are all threads from pool that servlet-container uses to process requests. So I suppose the answer is yes. – mkrakhin Apr 21 '15 at 12:34
  • Are you sure that the key being used at each stage is the same? For example could a non-printing character have been introduced which results in the map not finding the correct value? eg "mykey" vs "mykey ". – Rich Apr 21 '15 at 12:38
  • @Rich yes, second service should send me the same string I sent to it, and this string contains only alphanumeric characters (due to method of its generation). – mkrakhin Apr 21 '15 at 12:41
  • @mkrakhin If one thread first puts the record in the map then notifies other services to pull from the map, there shouldn't be a case where the item is null. Is that what you are seeing? Or is there a race between the write to the map and the notification of another service to read the map? – John Vint Apr 21 '15 at 12:46
  • I suspect I t has nothing to do with `ConcurrentHashMap`: it has to do with the accesses to the `Record`. All its fields need to be volatile, or access to them synchronized. Otherwise they are subject to no happens-before rule. – user207421 Apr 21 '15 at 12:48
  • @EJP: the key is a `String` and the result of `get()` is `null`, thus there is nothing *within* the `Record` type that can cause the problem. – Holger Apr 21 '15 at 12:50
  • Doesn't putting a value into a ConcurrentHashMap guarantee the safe publication of that value? – Rich Apr 21 '15 at 12:52
  • @JohnVint it's really what I seeing, there is no race (and also no code deleting records from map). As an example, one time I saw in logs this: record was placed into map, next request came in a minute and successfully obtained it, then third request came after **half an hour** and didn't found record. – mkrakhin Apr 21 '15 at 12:52
  • Could you add to your logs the key, the hashcode of the key, the toString of the concurrent hashmap during the initial put and after a get that has returned null to see whether you are indeed dealing with the same thing? – Rich Apr 21 '15 at 12:53
  • @mkrakhin I know you don't want to hear this, but if you are literally waiting a half an hour and the entry is not found there is something wrong in your code. A half hour is an eternity for a system and there will be no data races which will pop up. Again if it is really a half hour I would eliminate memory issues and happens-before ordering as a possibility and look else where. – John Vint Apr 21 '15 at 12:55
  • @Rich if I understand documentation correctly, `get()` could return any value that **once** was stored under this key; and then it guarantees that any actions that `happened-before` placing **obtained value** into `map` `happened-before` obtaining **this** value. – mkrakhin Apr 21 '15 at 12:56
  • One other possibility - Record doesn't extend SoftReference does it? – Rich Apr 21 '15 at 13:02
  • @Rich no, and not a phantom reference as well. – mkrakhin Apr 21 '15 at 13:02
  • @Rich I probably could try to provide logs, but I'm not sure how much time it will take from me, to experience this bug one more time. – mkrakhin Apr 21 '15 at 13:03
  • @JohnVint yes, it's definitely what I didn't want to hear :D Frankly speaking, I already spent a lot of time trying to figure out what's wrong and probable memory barriers was my last idea :) – mkrakhin Apr 21 '15 at 13:11
  • @mkrakhin Just to ensure you're not going crazy, you may want to print the object reference of the "singleton". Put logging in after each `put` and `get` with the identity hash code of the object. `System.identityHashCode(this)` The value returned will be the memory reference. I think it's more likely something with references if there isn't additional mutation. – John Vint Apr 21 '15 at 13:14
  • Also, how many instances have you deployed this to? Is there load balancing of sorts? – John Vint Apr 21 '15 at 13:15
  • @JohnVint `ConcurrentHashMap` overrides `hashCode()`, so it's not memory reference, but hash of elements inside. Anyway, I could compare `hashCode` after write and before next read, but when I tried to do so, everything worked as expected for a long time, and I haven't experienced any bugs :( – mkrakhin Apr 21 '15 at 13:21
  • There are two types of hash codes. There is the hashCode the object generates (content hashCode) if it overrides. And there is the `System.identityHashCode` which will always give you the memory reference of the object. Use System.identityHashCode – John Vint Apr 21 '15 at 13:24
  • @JohnVint hmm, it's deployed on Jelastic, without any load-balancer, and on one node (but there are several things called `cloudlets` for this node, so I couldn't be sure that physically it's really one server). – mkrakhin Apr 21 '15 at 13:24
  • 2
    @mkrakhin If there are multiple instances which are in some way load balanced I am sure that is your issue. Getting the memory reference of the singleton may help in your debugging and confirming this. – John Vint Apr 21 '15 at 13:25
  • @JohnVint Oh, I tried, it really gives memory reference. I misinterpreted JavaDoc for identityHashCode. Ok, I will try this, thank you. – mkrakhin Apr 21 '15 at 13:30

1 Answers1

6

You approach will not work as you are actually repeating the same mistake again. As ConcurrentHashMap.put and ConcurrentHashMap.get will create a happens before relationship but no time ordering guaranty, exactly the same applies to your reads and writes to the volatile variable. They form a happens before relationship but no time ordering guaranty, if one thread happens to call get before the other performed put, the same applies to the volatile read that will happen before the volatile write then. Besides that, you are adding another error as applying the ++ operator to a volatile variable is not atomic.

The guarantees made for volatile variables are not stronger than these made for a ConcurrentHashMap. It’s documentation explicitly states:

Retrievals reflect the results of the most recently completed update operations holding upon their onset.

The JLS states that external actions are inter-thread actions regarding the program order:

An inter-thread action is an action performed by one thread that can be detected or directly influenced by another thread. There are several kinds of inter-thread action that a program may perform:

  • External Actions. An external action is an action that may be observable outside of an execution, and has a result based on an environment external to the execution.

Simply said, if one thread puts into a ConcurrentHashMap and sends a message to an external entity and a second thread gets from the same ConcurrentHashMap after receiving a message from an external entity depending on the previously sent message, there can’t be a memory visibility issue.

It might be the case that these action aren’t programmed that way or that the external entity doesn’t have the assumed dependency, but it might be the case that the error lies in a completely different area but we can’t tell as you didn’t post the relevant code, e.g. the key doesn’t match or the printing code is wrong. But whatever it is, it won’t be fixed by the volatile variable.

Community
  • 1
  • 1
Holger
  • 285,553
  • 42
  • 434
  • 765
  • Your point is interesting. I thought volatile modifier offers some other sort of guarantees (kind of, if read was really **after** write, it will see correct value) than methods of ConcurrentHashMap (you could see my thoughts on it in comments to question). By the way, thank you for reminding, that increment is not atomic, it's my inadvertance. – mkrakhin Apr 21 '15 at 13:07
  • 2
    The other thing to add is that the JavaDoc for ConcurrentHashMap forbids the insertion of a null as either the key or the value - this proves that the key that you are looking for doesn't exist in the map. – Rich Apr 21 '15 at 13:10
  • Well, if read was “really after write”, you will see the correct value but the same applies to `get` and `put` of a `ConcurrentMap`. Note that if your code was arranged according to your described intentions, i.e. that the communication with an external entity implies an ordering, the API you are using for the communication will indeed provide you with an ordering that expands to the `ConcurrentMap` operations. But since you are encountering problems, the code is obviously not arranged that way (unless the real error lies in the printing or unrelated area…) – Holger Apr 21 '15 at 13:23
  • @Holger as I said before, communication really implies ordering. Requests are unconditionally subsequent. So it seems the problem is really in some other area. Could you please provide some quotes from specification, proving that ConcurrentHashMap really provides this guarantees, so I could accept your answer? – mkrakhin Apr 21 '15 at 13:36
  • Well, Aleksey Shipilev from Oracle confirmed, that ConcurrentHashMap really should return most recent update on retrieval. I accept this answer then. – mkrakhin Apr 21 '15 at 14:14