37

I have a "simple" 4 class example that reliably shows unexpected behavior from java synchronization on multiple machines. As you can read below, given the contract of the java sychronized keyword, Broke Synchronization should never be printed from the class TestBuffer.

Here are the 4 classes that will reproduce the issue (at least for me). I'm not interested in how to fix this broken example, but rather why it breaks in the first place.

Sync Issue - Controller.java

Sync Issue - SyncTest.java

Sync Issue - TestBuffer.java

Sync Issue - Tuple3f.java

And here is the output I get when I run it:

java -cp . SyncTest
Before Adding
Creating a TestBuffer
Before Remove
Broke Synchronization
1365192
Broke Synchronization
1365193
Broke Synchronization
1365194
Broke Synchronization
1365195
Broke Synchronization
1365196
Done

UPDATE: @Gray has the simplest example that breaks thus far. His example can be found here: Strange JRC Race Condition

Based on the feedback I've gotten from others, it looks like the issue may occur on Java 64-bit 1.6.0_20-1.6.0_31 (unsure about newer 1.6.0's) on Windows and OSX. Nobody has been able to reproduce the issue on Java 7. It may also require a multi-core machine to reproduce the issue.

ORIGINAL QUESTION:

I have a class which provides the following methods:

  • remove - Removes the given item from the list
  • getBuffer - Iterates over all the items in the list

I've reduced the problem down to the 2 functions below, both of which are in the same object and they're both synchronized. Unless I am mistaken, "Broke Synchronization" should never be printed because insideGetBuffer should always be set back to false before remove can be entered. However, in my application it is printing "Broke Synchronization" when I have 1 thread calling remove repeatedly while the other calls getBuffer repeatedly. The symptom is that I get a ConcurrentModificationException.

See Also:

Very strange race condition which looks like a JRE issue

Sun Bug Report:

This was confirmed as a bug in Java by Sun. It is apparently fixed (unknowingly?) in jdk7u4, but they have not backported the fix to jdk6. Bug ID: 7176993

Community
  • 1
  • 1
Luke
  • 3,742
  • 4
  • 31
  • 50
  • 1
    Can you post the code that is breaking, it may be the case that this code is broken. – Qwerky Jun 11 '12 at 15:22
  • The only way I can see this working "strangely" is if there is an exception thrown (and stifled) from `getBuffer`. –  Jun 11 '12 at 15:25
  • Show your stack trace. @JB Nizet is correct by the way, the ConcurrentModificationException has nothing (directly) to do with thread concurrency. Threading can more easily cause the issue, but the exception can just as easily be caused on a single thread. – Robin Jun 11 '12 at 15:27
  • @pst - if `list` is null and the exception is swallowed then this is definitely the case. Can't see how any other exception could occur though. – Qwerky Jun 11 '12 at 15:30
  • @Robin Stack trace has been added. I redacted 1 line of user code and one package/class name. Nothing interesting in there anyway. – Luke Jun 11 '12 at 15:39
  • Very strange. I tried to reproduce the problem with 2 threads and a list of 10 million objects and I never get the exception or the printout. – Tudor Jun 11 '12 at 17:22
  • @Luke: just to make sure. Is the list completely private to the object? Are you sure you never return a reference to this list to some external object, and are you sure this list is not given to your object by an external object? – JB Nizet Jun 11 '12 at 17:34
  • @JBNizet That is correct. `list` is initialized inside the class, there are no `list =` statements and nothing returns `list`. – Luke Jun 11 '12 at 17:39
  • Just checking, both list and insideGetBuffer are *instance* member variables (not static), right? – sjlee Jun 11 '12 at 18:33
  • @sjlee That is correct, both instance variables. – Luke Jun 11 '12 at 18:35
  • @Qwerky I have posted complete code now. – Luke Jun 13 '12 at 17:44
  • I see this as a JRE error -- amazingly enough. I've revamped the sample code demonstrating the problem down to this: http://pastebin.com/AF85FRT7 I've also posted it to code review here: http://codereview.stackexchange.com/questions/12546/ – Gray Jun 13 '12 at 18:30
  • @Gray I submitted a bug report to Oracle. We'll see if they've got anything to add. I linked this question and your simpler example. My bet is on a JIT optimization happening around that Map code that gets the object that's stomping on synchronization. But hey, what do I know? :) – Luke Jun 14 '12 at 13:13
  • @Gray I tried your simplified jUnit test case and it runs without fail (several times) using an IBM JDK. So it would appear that it is an Oracle JRE issue. – Eric B. Jun 14 '12 at 17:39
  • @EricB. For giggles, did you try running it from `main` rather than from the `@test`? For me it fails every time when I run `main` but only like 1 out of 100 when I run the `@test`. – Luke Jun 14 '12 at 17:41
  • This is brilliant, careful work to distill a tight exposition. To be emulated. +1. I don't have the environment to try myself right now, but suggest that someone try declaring all shared variables (like `.inside...` to be `volatile`. Else the JIT might make the value thread local, which could explain the bad behavior. – Gene Jun 15 '12 at 03:51
  • @Gene I tried making everything `volatile` and I still get the issue. For what it's worth, when I run it with `-Djava.compiler=NONE`, which should disable JIT (I think), I am unable to reproduce the issue. However, that could simply be because the timing has changed drastically. – Luke Jun 15 '12 at 13:14
  • Same result when I run from the main(). With Oracle HotSpot JVM (1.6.29) it fails, but run properly under IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 Windows Vista x86-32 jvmwi3260-20080816_22093 (JIT enabled, AOT enabled). Has anyone tried with HotSpot 1.7 JVM to see if the problem still exists? – Eric B. Jun 15 '12 at 14:19
  • @EricB. I've heard a few people say it works fine on 1.7, but I haven't tried it myself. Hoping to get a chance to try it this weekend. – Luke Jun 15 '12 at 14:59
  • Seems to require being run on a multi-core system. – Old Pro Jun 17 '12 at 06:03
  • 1
    FYI I just ran the example with your 4 classes and with @Gray simplified test case on jdk 1.7.0_03 / windows 7 / quad core and got similar outputs. – assylias Jun 18 '12 at 13:42
  • @assylias By "similar outputs" you mean that the synchronization failed? If you reproduced the issue on Java 7, then you would be the first I've heard of, that's good news (depending on your definition of good)! – Luke Jun 18 '12 at 14:13
  • 3
    @Luke Could you post the link to the bug report you submitted to Oracle ? I would like to follow what happens there. – Radu Murzea Jun 19 '12 at 19:43
  • I can't reproduce this. Can't wait until I get home to try it on a 64bit machine. In the meantime I noticed that item numbers are in order. Does this condition happen sporadically during the run, or does it run correct until a point in time where it continuously fails? – Selim Jun 20 '12 at 11:26
  • 3
    @SoboLAN Here is a link to the bug report: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7176993 – Luke Jun 20 '12 at 12:29
  • @Selim The item numbers are really just fake at this point. No remove is actually happening from the list. The only reason things are even put into the list at all is to preserve the timing of the for loop. But to answer your question, it seems to happen in chunks. That is, 4038-4060 and 68708-68750 might fail. Or maybe only 1 chunk fails, or maybe only 1 fails. It depends largely on timing. – Luke Jun 20 '12 at 16:18

10 Answers10

17

I think you are indeed looking at a JVM bug in the OSR. Using the simplified program from @Gray (slight modifications to print an error message) and some options to mess with/print JIT compilation, you can see what is going on with the JIT. And, you can use some options to control that to a degree that can suppress the issue, which lends a lot of evidence to this being a JVM bug.

Running as:

java -XX:+PrintCompilation -XX:CompileThreshold=10000 phil.StrangeRaceConditionTest

you can get an error condition (like others about 80% of the runs) and the compilation print somewhat like:

 68   1       java.lang.String::hashCode (64 bytes)
 97   2       sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (553 bytes)
104   3       java.math.BigInteger::mulAdd (81 bytes)
106   4       java.math.BigInteger::multiplyToLen (219 bytes)
111   5       java.math.BigInteger::addOne (77 bytes)
113   6       java.math.BigInteger::squareToLen (172 bytes)
114   7       java.math.BigInteger::primitiveLeftShift (79 bytes)
116   1%      java.math.BigInteger::multiplyToLen @ 138 (219 bytes)
121   8       java.math.BigInteger::montReduce (99 bytes)
126   9       sun.security.provider.SHA::implCompress (491 bytes)
138  10       java.lang.String::charAt (33 bytes)
139  11       java.util.ArrayList::ensureCapacity (58 bytes)
139  12       java.util.ArrayList::add (29 bytes)
139   2%      phil.StrangeRaceConditionTest$Buffer::<init> @ 38 (62 bytes)
158  13       java.util.HashMap::indexFor (6 bytes)
159  14       java.util.HashMap::hash (23 bytes)
159  15       java.util.HashMap::get (79 bytes)
159  16       java.lang.Integer::valueOf (32 bytes)
168  17 s     phil.StrangeRaceConditionTest::getBuffer (66 bytes)
168  18 s     phil.StrangeRaceConditionTest::remove (10 bytes)
171  19 s     phil.StrangeRaceConditionTest$Buffer::remove (34 bytes)
172   3%      phil.StrangeRaceConditionTest::strangeRaceConditionTest @ 36 (76 bytes)
ERRORS //my little change
219  15      made not entrant  java.util.HashMap::get (79 bytes)

There are three OSR replacements (the ones with the % annotation on the compile ID). My guess is that it is the third one, which is the loop calling remove(), that is responsible for the error. This can be excluded from JIT via a .hotspot_compiler file located in the working directory with the following contents:

exclude phil/StrangeRaceConditionTest strangeRaceConditionTest

When you run the program again, you get this output:

CompilerOracle: exclude phil/StrangeRaceConditionTest.strangeRaceConditionTest
 73   1       java.lang.String::hashCode (64 bytes)
104   2       sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (553 bytes)
110   3       java.math.BigInteger::mulAdd (81 bytes)
113   4       java.math.BigInteger::multiplyToLen (219 bytes)
118   5       java.math.BigInteger::addOne (77 bytes)
120   6       java.math.BigInteger::squareToLen (172 bytes)
121   7       java.math.BigInteger::primitiveLeftShift (79 bytes)
123   1%      java.math.BigInteger::multiplyToLen @ 138 (219 bytes)
128   8       java.math.BigInteger::montReduce (99 bytes)
133   9       sun.security.provider.SHA::implCompress (491 bytes)
145  10       java.lang.String::charAt (33 bytes)
145  11       java.util.ArrayList::ensureCapacity (58 bytes)
146  12       java.util.ArrayList::add (29 bytes)
146   2%      phil.StrangeRaceConditionTest$Buffer::<init> @ 38 (62 bytes)
165  13       java.util.HashMap::indexFor (6 bytes)
165  14       java.util.HashMap::hash (23 bytes)
165  15       java.util.HashMap::get (79 bytes)
166  16       java.lang.Integer::valueOf (32 bytes)
174  17 s     phil.StrangeRaceConditionTest::getBuffer (66 bytes)
174  18 s     phil.StrangeRaceConditionTest::remove (10 bytes)
### Excluding compile: phil.StrangeRaceConditionTest::strangeRaceConditionTest
177  19 s     phil.StrangeRaceConditionTest$Buffer::remove (34 bytes)
324  15      made not entrant  java.util.HashMap::get (79 bytes)

and the problem does not appear (at least not in the repeated attempts that I've made).

Also, if you change the JVM options a bit, you can cause the problem to go away. Using either of the following I cannot get the problem to appear.

java -XX:+PrintCompilation -XX:CompileThreshold=100000 phil.StrangeRaceConditionTest
java -XX:+PrintCompilation -XX:FreqInlineSize=1 phil.StrangeRaceConditionTest

Interestingly, the compilation output for both of these still show the OSR for the remove loop. My guess (and it is a big one) is that delaying the JIT via the compilation threshold or changing the FreqInlineSize cause changes to the OSR processing in these cases that bypass a bug that you are otherwise hitting.

See here for info on the JVM options.

See here and here for information on the output of -XX:+PrintCompilation and how to mess with what the JIT does.

philwb
  • 3,805
  • 19
  • 20
10

So according to the code that you've posted, you would never get Broke Synchronization printed unless getBuffer() throws an exception between the true and false setting. See a better pattern below.

Edit:

I took @Luke's code and whittled it down to this pastebin class. As I see it, @Luke is hitting a JRE synchronization bug. I know that's hard to believe but I've been looking at the code and I just can't see the problem.


Since you mention ConcurrentModificationException, I suspect that getBuffer() is throwing it when it iterates across the list. The code that you posted should never throw a ConcurrentModificationException because of the synchronization but I suspect that some additional code is calling add or remove that is not synchronized, or you are removing while you are iterating across the list. The only way you can modify a un-synchronized collection while you are iterating across it is through the Iterator.remove() method:

Iterator<Object> iterator = list.iterator();
while (iterator.hasNext()) {
   ...
   // it is ok to remove from the list this way while iterating
   iterator.remove();
}

To protect your flag, be sure to use try/finally when you are setting a critical boolean like this. Then any exception would restore the insideGetBuffer appropriately:

synchronized public Object getBuffer() {
    insideGetBuffer = true;
    try {
        int i=0;
        for(Object item : list) {
            i++;
        }
    } finally {
        insideGetBuffer = false;
    }
    return null;
}

Also, it is a better pattern to synchronize around a particular object instead of using method synchronization. If you are trying to protect the list, then adding synchronization around that list each time would be better.n

 synchronized (list) {
    list.remove();
 }

You can also turn your list into a synchronized list which you wouldn't have to synchronize on each time:

 List<Object> list = Collections.synchronizedList(new ArrayList<Object>());
Gray
  • 115,027
  • 24
  • 293
  • 354
  • The Exception is the only thing I can think (given the code and only the code) of that could cause the strange print behavior (+1)... however, it'd be hard to get that to generate the CME as well? –  Jun 11 '12 at 15:25
  • Tried adding a try/finally block to the code as you suggested, yet "Broke Synchronization" still prints. I'm not totally surprised, as I cannot see what inside the try block would throw an exception. To your other point, the code posted is the **exact** code I'm running. As you can see, it is not modifying the list inside remove. There is also no other list modifier inside my class (with the exception of add, which is also synchronized and does not happen anywhere near the exception firing). – Luke Jun 11 '12 at 15:27
  • I've updated my answer @pst to show you the `ConcurrentModificationException` is causing the issue. – Gray Jun 11 '12 at 15:27
  • @Luke if you can post a simple `main` executable that will demonstrate the concurrency issue you are seeing we can help better. At this point its all speculation – John Vint Jun 11 '12 at 15:28
  • But +1 Gray. Didn't think about the exception causing an incorrect state for the flag. – John Vint Jun 11 '12 at 15:29
  • There is code that you aren't showing @Luke. Can you post the whole class to http://pastebin.com? – Gray Jun 11 '12 at 15:44
  • @Gray I'm going to keep banging on this to get a simple unit test that reproduces the problem. I was hoping there might be some simple answer like "That's not how Java serialization works" or a logic issue that I was missing. Will update when I've got something. – Luke Jun 11 '12 at 15:50
  • @Gray see the top of the question. I've added a 4 class example to pastebin that reproduces the problem. – Luke Jun 13 '12 at 13:45
  • This is fascinating @Luke. I think you have a JRE bug here. I've revamped it to it's smallest size here: http://pastebin.com/AF85FRT7 I've also posted it to code review here: http://codereview.stackexchange.com/questions/12546/very-strange-race-condition-which-looks-like-a-jre-issue Hope that's ok. – Gray Jun 13 '12 at 18:05
  • @Gray That's absolutely ok! Thanks, this one has had me stumped for a good while now. I appreciate all the help I can get with this. – Luke Jun 13 '12 at 18:27
  • I'm sorry to say that at this time @Luke, I would recommend trying to upgrade your Java version. I really believe that this is not a code error. – Gray Jun 13 '12 at 18:29
  • I cannot reproduce the problem with the full code posted or with @Gray pastbin code using HotSpot 1.6.0_31-b05 on a single core Windows XP machine. However I ***can*** reproduce the problem on a dual-core Mac using HotSpot 1.6.0_31-b04-415-11M3635. I'm convinced it is a JSE bug. I added a bunch of extra synchronizations to be sure it wasn't some obfuscated problem about different threads synchronizing on different objects but it didn't fix the problem. – Old Pro Jun 17 '12 at 06:03
  • @OldPro both were x64 JRE's? x64 Windows XP is rare, so I wanted to check. 32-bit JRE's seem to be ok. – Luke Jun 18 '12 at 11:41
  • @Gray you can remove the constuctor and the list from the `Buffer` inner class while keeping the error. Interestingly, when removing the `synchronized` keyword from `StrangeRaceConditionTest#getBuffer()` or `StrangeRaceConditionTest#remove()` I don't get the weird behaviour any more... – assylias Jun 18 '12 at 14:08
4

Based on that code there are only two ways that "Broke Synchronization" will print.

  1. They are synchronizing on different object (which you say they are not)
  2. The insideGetBuffer is being changed by another thread outside of synchronized block.

Without those two there can't be a way that code you listed will be printing "Broke Synchronization" & the ConcurrentModificationException. Can you give a small snippet of code that can be run to prove what you are saying?

Update:

I went through the example Luke posted and I am seeing odd behaviors on Java 1.6_24-64 bit Windows. The same instance of TestBuffer and the value of the insideGetBuffer is 'alternating' inside the remove method. Note the field is not updated outside a synchronized region. There is only one TestBuffer instance but let's assume they aren't - insideGetBuffer would never be set to true (so it must be the same instance).

    synchronized public void remove(Object item) {

            boolean b = insideGetBuffer;
            if(insideGetBuffer){
                    System.out.println("Broke Synchronization : " +  b + " - " + insideGetBuffer);
            }
    }

Sometimes it prints Broke Synchronization : true - false

I am working on getting the assembler to run on Windows 64 bit Java.

John Vint
  • 39,695
  • 7
  • 78
  • 108
  • So I'm ruling out 1 because `insideGetBuffer` is an instance variable, it's part of the object. I just added `insideGetBuffer` to the code so I would say that nothing is changing it directly. I would **Love** to be able to provide a small snippet of code, but they all run successfully without this issue. For whatever strange reason it's only happening IN my application. I'm going to continue to try to simplify my application to attempt to get a simple executable. – Luke Jun 11 '12 at 15:39
  • @Luke to rule out 1 can you do this: Put this in the `remove` method `System.out.println("remove: " + System.identityHashCode(this));` and put this in the `getBuffer` `System.out.println("getBuffer: " + System.identityHashCode(this));` – John Vint Jun 11 '12 at 15:50
  • @Luke after you do that can you comment on your output – John Vint Jun 11 '12 at 15:50
  • basically the output is a bunch of `remove: 1896973799` with the occasional `getBuffer: 1896973799` thrown in there. While the test did confirm my statement, I'm curious why the fact that insideGetBuffer being a member variable wasn't proof enough. How could 2 separate objects share the same member variable? Also better to be safe, I suppose. – Luke Jun 11 '12 at 16:03
  • @Luke Though not a knock on your, with such little actual code I can only assume that the methods are part of a member-subclass and the field is shared by the encompassing class. I just wanted to make sure they were the same object which appears that way. I guess 2 may be the only other way. – John Vint Jun 11 '12 at 16:07
  • I updated the code above with the complete `TestBuffer` class that I'm using and added it's output. I know it's not a simple executable yet, but I'm getting there. – Luke Jun 12 '12 at 12:45
  • @Luke Thanks for the update. I tried running a test against it myself and couldnt cause it to break. I made a simple main in that class and ran 10 threads the created an Object, `add`ed it, `getBuffer` and then `remove`d it and seemed to work. I know you're still working on it but figure I let you know – John Vint Jun 12 '12 at 13:07
  • See the top of the question. I've edited it to include a "simple" 4 class example that reproduces the problem on the 2 machines I have access to. There are 4 links to pastebin .java files there. I'm really hoping this isn't too hardware specific. – Luke Jun 13 '12 at 13:44
  • @Luke I'm going through your example, not sure exactly what is going on but you may have found an issue. I'll keep looking at it. – John Vint Jun 13 '12 at 14:45
  • I agree @John. I've revamped the sample code demonstrating the problem down to this: http://pastebin.com/AF85FRT7 I've also posted it to code review here: http://codereview.stackexchange.com/questions/12546 Does my code fail for you too John? – Gray Jun 13 '12 at 18:57
  • @Gray It does but only on a Windows 64-bit Java 1.6_24 (may be others). I tried with 32-bit windows and its fine, also tried with Java 7. I then moved to linux 64-bit Java and it works. Hopefully if I can get the hsdis-amd64 installation for the assembler it may be more clear with the runtime is doing. – John Vint Jun 13 '12 at 19:10
2

A ConcurrentModificationException, most of the time, is not caused by concurrent threads. It's caused by the modification of the collection while it's being iterated:

for (Object item : list) {
    if (someCondition) {
         list.remove(item);
    }
}

The above code would cause a ConcurrentModificationException if someCondition is true. While iterating, the collection can only be modified through the iterator's methods:

for (Iterator<Object> it = list.iterator(); it.hasNext(); ) {
    Object item = it.next();
    if (someCondition) {
         it.remove();
    }
}

I suspect that this is what happens in your real code. The posted code is fine.

JB Nizet
  • 678,734
  • 91
  • 1,224
  • 1,255
  • 1
    In my experience this is often true as well. However, not evident in the code in the post :( –  Jun 11 '12 at 15:23
2

Can you try this code which is a self contained test?

public static class TestBuffer {
    private final List<Object> list = new ArrayList<Object>();
    private boolean insideGetBuffer = false;

    public TestBuffer() {
        System.out.println("Creating a TestBuffer");
    }

    synchronized public void add(Object item) {
        list.add(item);
    }

    synchronized public void remove(Object item) {
        if (insideGetBuffer) {
            System.out.println("Broke Synchronization ");
            System.out.println(item);
        }

        list.remove(item);
    }

    synchronized public void getBuffer() {
        insideGetBuffer = true;
//      System.out.println("getBuffer.");
        try {
            int count = 0;
            for (int i = 0, listSize = list.size(); i < listSize; i++) {
                if (list.get(i) != null)
                    count++;
            }
        } finally {
//          System.out.println(".getBuffer");
            insideGetBuffer = false;
        }
    }
}

public static void main(String... args) throws IOException {
    final TestBuffer tb = new TestBuffer();
    ExecutorService service = Executors.newCachedThreadPool();
    final AtomicLong count = new AtomicLong();
    for (int i = 0; i < 16; i++) {
        final int finalI = i;
        service.submit(new Runnable() {
            @Override
            public void run() {
                while (true) {
                    for (int j = 0; j < 1000000; j++) {
                        tb.add(finalI);
                        tb.getBuffer();
                        tb.remove(finalI);
                    }
                    System.out.printf("%d,: %,d%n", finalI, count.addAndGet(1000000));
                }
            }
        });
    }
}

prints

Creating a TestBuffer
11,: 1,000,000
2,: 2,000,000
... many deleted ...
2,: 100,000,000
1,: 101,000,000

Looking at your stack trace in more detail.

Caused by: java.util.ConcurrentModificationException
    at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
    at java.util.HashMap$KeyIterator.next(Unknown Source)
    at <removed>.getBuffer(<removed>.java:62)

You can see that you are accessing the key set of a HashMap, not a list. This is important because the key set is a view on the underlying map. This means you need to ensure that every access to this map is also protected by the same lock. e.g. say you have a setter like

Collection list;
public void setList(Collection list) { this.list = list; }


// somewhere else
Map map = new HashMap();
obj.setList(map.keySet());

// "list" is accessed in another thread which is locked by this thread does this
map.put("hello", "world");
// now an Iterator in another thread on list is invalid.
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • You're right, but that's not happening here. I also changed `list` to an ArrayList in my most recent tests and the issue still happens. – Luke Jun 12 '12 at 12:19
  • If you're still interested in helping out, I've added a full example to the top of my question. – Luke Jun 13 '12 at 13:46
  • I have added a fuller example (i.e. one which you can run) – Peter Lawrey Jun 13 '12 at 14:18
  • Your test runner does not produce a "Broke Synchronization" printout on my machine. That's not surprising really, I've run several hundred variations of this problem and only a few fail. The simplest that I've gotten to fail is 4 classes, one of which IS runnable. I've posted this example to the top of my question above with pastebin links to each of the classes. Could you run that example and see if it fails for you? – Luke Jun 13 '12 at 14:44
2

'getBuffer' function in the Controller class is creating this issue. If two threads enter at the same time to the following 'if' condition for the first time, then controller will end up creating two buffer objects. add function is called on the first object and remove will be called on the second object.

if (colorToBufferMap.containsKey(defaultColor)) {

When two threads (add and remove threads) enter at the same time (when the buffer was not yet added to colorToBufferMap), above if statement will return false and both the threads will enter the else part and create two buffers, since buffer is a local variable these two threads will receive two different instance of buffer as part of return statement. However, only the last one created will be stored in the global variable 'colorToBufferMap'.

Above problematic line is part of getBuffer function

public TestBuffer getBuffer() {
    TestBuffer buffer = null;
    if (colorToBufferMap.containsKey(defaultColor)) {
        buffer = colorToBufferMap.get(defaultColor);
    } else {
        buffer = new TestBuffer();
        colorToBufferMap.put(defaultColor, buffer);
    }
    return buffer;
}

Synchronizing the 'getBuffer' function in Controller class will fix this issue.

sperumal
  • 1,499
  • 10
  • 14
  • some more additional explanation, 'addsome' function and 'remove' function in the controller class both use 'getBuffer' function. add thread and remove thread may call getBuffer exactly at the same time and for the first time, the 'if' condition I pointed out will create two buffer objects. – sperumal Jun 14 '12 at 14:18
  • You are completely right and totally wrong all at the same time. Thanks for pointing that out, that's actually a **different** synchronization error that I didn't catch or consider! However, if you check out the example from @John Vint you'll notice that he actually did synchronize that method and the issue still happens. You'll also notice that in my test application it doesn't actually cause a problem because I do all the adds first in a single thread (which will create the TestBuffer in the map) and the later I do the remove/getBuffers. Good catch though. +1! – Luke Jun 14 '12 at 17:25
  • Sorry, I meant the example from @Gray, not John Vint. – Luke Jun 14 '12 at 19:20
  • I ran the code that Gray posted on 32 bit JVM, I didn't get the error at all. So I am wondering, if you declare the variable 'insideGetBuffer' as volatile, does it stop the error?. I am guessing JVM memory model is not flushing or reloading the latest value of boolean variable from shared memory. – sperumal Jun 14 '12 at 19:50
  • Still breaks with volatile. Note that we've only been able to break it on Java 6 64-bit on Windows. – Luke Jun 14 '12 at 19:57
1

Edit: Answer valid only when two different Object instances are used in calling the methods repeatedly.

The scenario: You have two synchronized method. One for removing an entity and another for accessing. The problem comes when 1 thread is inside the remove method and another thread is in the getBuffer method and sets the insideGetBuffer=true.

As you found out you need to put synchronization on the list because both these methods work on you list.

Akhi
  • 2,242
  • 18
  • 24
  • 1
    If they're both synchronized methods, how does that happen? – Luke Jun 11 '12 at 15:30
  • yes they are synchronized methods but only individually. Think one thread is accessing the remove method. so it has a lock on that method only. A different thread can access getBuffer method as thread 1 doesn't have a sync lock on it. – Akhi Jun 11 '12 at 15:33
  • 1
    Akhil Dev Synchronization occurs on the object reference not the method. If they are the same object's then Thread-1::method-A will have to wait for Thread-2::method-B invocation – John Vint Jun 11 '12 at 15:34
  • let me quote what Luke asked "when I have 1 thread calling remove repeatedly while the other calls getBuffer repeatedly". Nothing is said about wheather same object is used in 2 threads and did the testing.So normally it will be different objects. @Luke please clarify. – Akhi Jun 11 '12 at 15:51
  • @AkhilDev From Luke `I've reduced the problem down to the 2 functions below, both of which are in the same object` if by same object he means same reference then yes he does state that. – John Vint Jun 11 '12 at 15:52
  • Yes, same reference and the boolean is a member of the class, not static. – Luke Jun 11 '12 at 15:52
1

If access to list and insideGetBuffer is fully contained in that code, the code looks certainly thread safe and I do not see a possibility "Broke synchronization" can be printed, barring a JVM bug.

Can you double check all possible access to your member variables (list and insideGetBuffer)? Possibilities include if list was passed onto you through constructor (which your code doesn't show) or these variables are protected variables so subclasses can change them.

Another possibility is access via reflection.

sjlee
  • 7,726
  • 2
  • 29
  • 37
  • I'm not sure when you looked at the code in my example last, but if you check it out I've now posted the entire `TestBuffer` class. It includes the constructor, and shows that `insideGetBuffer` and `list` are both private member variables. Reflection is an interesting thought. I'm using Spring and I'm not entirely sure what Spring is doing under the hood. That seems like a longshot, but at this point I'm running out of ideas that don't lead to a JVM bug (which is unlikely, but possible). – Luke Jun 12 '12 at 17:51
  • Can never say never, but I would doubt that there is a JVM bug here, as this is one of the more basic aspects of the java memory model. – sjlee Jun 12 '12 at 21:03
  • If you're interested, I've posted a 4 class complete example to the top of my question with pastebin links to each class. – Luke Jun 13 '12 at 14:47
  • I decreased the period to 1 and increased the NUMBER_TO_USE to 10000000 but no problem. Which version of Java are you using? I have Java 7 update 4. – Peter Lawrey Jun 13 '12 at 14:53
  • ... other than it taking a really long time. ;) – Peter Lawrey Jun 13 '12 at 14:56
  • btw: you never actually remove a value as you only try to remove entries after those in the list. i.e. you `remove(i++)` instead of `--i` – Peter Lawrey Jun 13 '12 at 14:59
  • ... and `List.remove(Object)` is very expensive for a large number of objects. – Peter Lawrey Jun 13 '12 at 15:00
  • @PeterLawrey You're correct, I don't remove anything anymore. I'm long beyond that now, I'm just interested in the race condition/sync error. I'm using 64-bit 1.6.0_24 on Windows 7. I've seen other people say it works ok on 32-bit and it works ok on Java 7 also. I've yet to test those, as my primary deployment targets are 64-bit Java 6. By the way, I'm not sure why we're commenting on sjlee's post here. :) – Luke Jun 14 '12 at 12:31
1

I don't believe this is a bug in the JVM.

My first suspicion was that it was some sort of operation reordering that the compiler is doing (on my machine, it works fine in a debugger, but sync fails when running) but

I can't tell you why, but I very strongly suspect that something is giving up the lock on TestBuffer that is implicit in declaring getBuffer() and remove(...) synchronized.

For example, replace them with this:

public void getBuffer() {
    synchronized (this) {
        this.insideGetBuffer = true;
        try {
            int i = 0;
            for (Object item : this.list) {
                if (item != null) {
                    i++;
                }
            }
        } finally {
            this.insideGetBuffer = false;
        }
    }

}

public void remove(final Object item) {
    synchronized (this) {
        // fails if this is called while getBuffer is running
        if (this.insideGetBuffer) {
            System.out.println("Broke Synchronization ");
            System.out.println(item);
        }
    }
}

And you still have your sync error. But pick something else to log on, eg:

private Object lock = new Object();
public void getBuffer() {
    synchronized (this.lock) {
        this.insideGetBuffer = true;
        try {
            int i = 0;
            for (Object item : this.list) {
                if (item != null) {
                    i++;
                }
            }
        } finally {
            this.insideGetBuffer = false;
        }
    }

}

public void remove(final Object item) {
    synchronized (this.lock) {
        // fails if this is called while getBuffer is running
        if (this.insideGetBuffer) {
            System.out.println("Broke Synchronization ");
            System.out.println(item);
        }
    }
}

And everything works as expected.

Now, you can simulate giving up the lock by adding:

this.lock.wait(1);

in the for loop of getBuffer() and you'll start failing again.

I remain stumped on what is giving up the lock, but in general it might be a better idea to use explicit synchronization on protected locks than the synchronization operator.

Mason Bryant
  • 1,372
  • 14
  • 23
  • I see what you're saying here, but I fail to see how this could be anything but a JVM error. You're saying yourself that it looks like something is giving up the lock, yet there is no explicit surrender of the lock in my code. If the lock is given up and I didn't say it should be given up, then that's an error outside my code, probably in the JVM. You can also fix the problem by synchronizing on `list` by the way. That's how I fixed it in my application. But I'm no longer interested in fixing it. I'm interested in why it breaks. – Luke Jun 14 '12 at 12:27
  • If it is a bug in anything, it is much more likely to be a bug in the class library. For example: in the controller, add a TestBuffer buffer = new TestBuffer(); at the top and just return that in getBuffer() things work again. But maybe I'm splitting hairs distinguishing between JVM and class library. – Mason Bryant Jun 14 '12 at 17:29
  • Thanks for the clarification, now I understand what you mean. Yes, it certainly could be in HashMap or somewhere else in the `class library`. – Luke Jun 14 '12 at 17:31
0

I had a similar problem before. The error is that you did not declare some field as volatile. This keyword is used to indicate that a field will be modified by different threads, and thus it cannot be cached. Instead, all the writes and the reads MUST go to the "real" memory location of the field.

For more information, just google for "Java Memory Model"

Albeit most of the readers focus on class TestBuffer, I think that the problem might be somewhere else (e.g., have you tried to add syncronization on class Controller? Or make volatile its fields?).

ps. notice that different java VMs might use different optimization in different platforms, and thus the synchronization issues might appear more often on a platform than another one. The only way to be safe is to be compliant with the Java's specifications, and file a bug if a VM does not respect it.

Matteo
  • 1,367
  • 7
  • 25
  • That does not explain the behaviour described in the question. The Java Memory Model explicitly excludes what has been reported, and synchronized blocks give stronger guarantees than volatile variables. – assylias Jun 20 '12 at 08:56
  • Not relevant. Code synchronized on the same implicit lock object has the same visibility guarantee as volatile. What you are stating is not wrong, but is not an answer to why this is happening. – Selim Jun 20 '12 at 11:19
  • maybe I got lost in the four classes, but it seems to me that class Controller can have exactly the volatile problem, as its fields are accessed concurrently by different threads. Indeed, that class does not declare any synchronized method... – Matteo Jun 20 '12 at 11:44
  • Controller.getBuffer() does, in fact, need to be synchronized (in my 4 class example). @sperumal pointed out this issue in his answer. Nothing needs to be `volatile`, as long as it is only changed within synchronized blocks. However, even with these changes, the issue still happens. Gray has a 1 class example, linked above as well as in his answer. He properly added `synchronized` to the getBuffer function. Please feel free to try different things yourself. I think you'll find that it still fails. – Luke Jun 20 '12 at 12:25