1

I recently had a problem with three separate servers running the same code all experiencing the same symptoms. These are high volume REST / JSON servers that use json-lib to create the JSON responses. The servers all eventually hang with most of the threads waiting on one particular lock. The thread holding that lock all had the same basic appearance:

 "TP-Processor204" daemon prio=10 tid=0x00002aac40d85000 nid=0x6978 waiting for monitor entry [0x000000004dec8000] 
    java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.commons.beanutils.BeanUtilsBean.getInstance(BeanUtilsBean.java:78)
        - locked <0x00002aaab230f970> (a java.lang.Class for org.apache.commons.beanutils.BeanUtilsBean)
        at org.apache.commons.beanutils.PropertyUtilsBean.getInstance(PropertyUtilsBean.java:101)
        at org.apache.commons.beanutils.PropertyUtils.getProperty(PropertyUtils.java:290)
        at net.sf.json.JSONObject._fromBean(JSONObject.java:918)
        at net.sf.json.JSONObject.fromObject(JSONObject.java:168)
        at net.sf.json.AbstractJSON._processValue(AbstractJSON.java:265)
        at net.sf.json.JSONArray._processValue(JSONArray.java:2514)
        at net.sf.json.JSONArray.processValue(JSONArray.java:2539)
        at net.sf.json.JSONArray.addValue(JSONArray.java:2526)
        at net.sf.json.JSONArray._fromCollection(JSONArray.java:1057)
        at net.sf.json.JSONArray.fromObject(JSONArray.java:123)
        at net.sf.json.AbstractJSON._processValue(AbstractJSON.java:237)
        at net.sf.json.JSONObject._processValue(JSONObject.java:2808)

This is the only lock this thread is holding. I've tried briefly googling what the monitor entry value refers to, but didn't have any luck. For this thread, the value [0x000000004dec8000] doesn't appear to refer to an object id and it doesn't appear anywhere else in the stack trace.

I found the exact same issue here with no answer and this older SO question that says it's a JVM bug caused by not properly assigning the monitor to one of the waiting threads. I'm not completely sure I understand how the a thread could be marked as locking a monitor but not actually given the monitor, but it makes sense that those operations may be separate and a bug in the JVM causes a problem after assigning the lock but before assigning the monitor (although I've always thought of them as the same thing).

The java version I'm using is:

java version "1.6.0_18" Java(TM) SE Runtime Environment (build 1.6.0_18-b07) Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)

On:

CentOS release 5.2 (Final) Kernel version: 2.6.18-194.17.4.el5xen

Is this really just a JVM bug or is there something else I should be looking into?

Edit:

The commons-beanutils version we were using was 1.7. We have since upgraded to 1.8 to see if it resolves the issue.

Community
  • 1
  • 1
dlawrence
  • 1,645
  • 12
  • 13
  • If you suspect a JVM bug, the best thing to do is try the latest version Java 6 update 26 to see if it has been fixed. There are a number of JVM improvement in this version over the last 8 updates. – Peter Lawrey Sep 20 '11 at 20:28
  • We can definitely do that. I'm also more interested in understanding how to interpret that output though. Is that a non-sensical stack trace (which implies a JVM bug) or do I just not understand how to read that correctly. – dlawrence Sep 20 '11 at 20:32
  • waiting for class loading doesn't always show up on the stack trace. – Peter Lawrey Sep 20 '11 at 20:38
  • I believe I recall a synchronization bug of this sort in the 64-bit JVM that has since been fixed. If that isn't it, please say what version of Commons BeanUtils you're using - I tried to see what's happening on that line and got the wrong version - it doesn't line up. – Ed Staub Sep 20 '11 at 20:40
  • how's the CPU and memory usage at the time? – irreputable Sep 20 '11 at 20:56
  • We are using 1.7 of Commons BeanUtils, so it may be caused by BEANUTILS-49. When the issue occurs, memory usage looks fine (we allocate 4GB to the jvm on a 6GB xen vm and java was using about 1.5GB of that when the problem occurred. CPU was 100% pegged on one core with a load average of 1. – dlawrence Sep 20 '11 at 21:08

1 Answers1

0

This appears to be related to this bug

https://issues.apache.org/jira/browse/BEANUTILS-49

When loading a class for the first time, the loading and static block is thread safe. If there is a problem loading the class this is not always apparent.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • I'm not sure I understand that last part. Did you mean the loading and static block is not thread safe? – dlawrence Sep 20 '11 at 21:23
  • This change seems to have made the problem go away, so I'll mark it correct, although I would still be interested in understanding what that stack trace means. – dlawrence Sep 21 '11 at 02:18
  • the class loading and static block is thread safe, however it may not use a locked object or appear in the stack trace the same way that locking an object appears. – Peter Lawrey Sep 21 '11 at 07:34