0

While troubleshooting one of the jvm hung issues in our production environment, we encountered that one of threads which executes the following logger statement

logger.debug("Loaded ids as " + ids + ".");

is hung at this step with the thread status as runnable. Here ids is a Set. There is another thread which waits on the above thread through a count down latch to complete its task. The software takes thread dumps every 15 mins and the stacktraces of the two threads look as below

Stack trace for [THREAD GROUP: Job_Executor] [THREAD NAME:main-Runner Thread][THREAD STATE: WAITING]
    ...sun.misc.Unsafe.park(Native Method)
    ...java.util.concurrent.locks.LockSupport.park(Unknown Source)
    ...java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown Source)
    ...java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(Unknown Source)
    ...java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(Unknown Source)
    ...java.util.concurrent.CountDownLatch.await(Unknown Source)
    ...com.runner.MainRunner.stopThread(MainRunnerRunner.java:1334)


Stack trace for [THREAD GROUP: Job_Executor] [THREAD NAME:task executor][THREAD STATE: RUNNABLE]    
    ...java.util.AbstractCollection.toString(Unknown Source)           
    ...java.lang.String.valueOf(Unknown Source)      
    ...java.lang.StringBuilder.append(Unknown Source)    
    ...com.runner.CriticalTaskExecutor.loadByIds(CriticalTaskExecutor.java:143)

This jvm got hung for almost 24 hrs and finally we had to kill it to move ahead. The thread dumps indicates that there are 43 threads in RUNNABLE state including the above one.

What could be the reasons for the above thread to be in RUNNABLE state for 24 hrs just executing collection.toString()?

Any suggestions on how to proceed?

Gray
  • 115,027
  • 24
  • 293
  • 354
Andy Dufresne
  • 6,022
  • 7
  • 63
  • 113

2 Answers2

1

What could be the reasons for the above thread to be in RUNNABLE state for 24 hrs just executing collection.toString()?

You've not provided enough information to be able to diagnose the problem. I would only challenge you to not assume there is a JVM issue going on here.

If we look at the source for the AbstractCollection.toString() method we see that it iterates through the collection and spits out approximately "[ item0, item1, item2 ]". Each of the item.toString() method is called to show the item.

If the application hang is in the collection toString() then my guess is that there is some problem with the iterator on the collection. You could tell this if your application is spinning -- using close to 100% of a CPU. Maybe the hasNext() method on the Set is always returning true?

If the application hang is in reality inside of the item.toString() then I'd make sure that your item is just displaying simple fields. Be careful of fields that if accessed make RPC calls like lazy loaded ORM wrapped fields.

If you provide details about the Set in question and show the id.toString() code we can help more.

It now sounds like this is a set of Integer objects. No idea why this would hang your application. Here's a couple other ideas:

  • Are you accessing this collection at all in a non-synchronized manner? Could multiple threads have made changes to the collection so that it is corrupted which has caused its iterator to spin? You might try wrapping it in a Collections.synchronizedSet(...).
  • Any chance the Set is huge and you are running close to being out of memory and the program is thrashing? That wouldn't hang your application however but just slow it to a crawl. And you'd start seeing out of memory exceptions.
  • Is there any chance that the toString() is being called over and over again? I assume you would see that in the logs however.
Gray
  • 115,027
  • 24
  • 293
  • 354
  • Thanks for your inputs. The underlying collection is a HashSet. Just so that I understand, if the code tries to make RPC calls for a lazily loaded collection, wont the stack trace indicate that? I mean won't it show trace of hibernate code trying to make database calls? – Andy Dufresne Oct 09 '12 at 14:11
  • You'd think so, yes @Andy. I just can't see a `HashSet` hanging anything so the assumption is that there is something else going on. – Gray Oct 09 '12 at 14:12
  • AHA!! @AndyDufresne are you synchronizing on this collection? Any chance that multiple threads are writing to this collection in a non-synchronous manner and have corrupted it so that something is spinning? – Gray Oct 09 '12 at 14:34
  • You might try wrapping it in a Collections.synchronizedSet(...) @AndyDufresne. – Gray Oct 09 '12 at 14:36
  • Thanks for your ideas. Here are my responses to the above three points you mentioned - 1. The collection is not accessed by multiple threads. I confirmed that through the code. Also the thread dump would have indicated that if that was the case, but there are no traces. 2. No the Set is not huge in size. It hardly has any elements. Also as the jvm was hung for more then 24 hrs, this would confirm that memory wasn't a problem. 3. The toString() is just called in that logger.I do not think log4j was call it iteratively multiple times. – Andy Dufresne Oct 10 '12 at 03:22
  • Looking at the thread dump I was thinking if the thread is really stuck while calling collection.toString()? Since the thread state is RUNNABLE and there are other 42 such threads in this state, it could also mean that the processor never picked up this thread to run. Highly unlikely but any thoughts on these lines? – Andy Dufresne Oct 10 '12 at 03:25
  • That really is not how it works. JVMs should never starve a thread even if there is a ton of other high priority tasks to run. Was the CPU spinning or close to 0%? Could you attach to it with Jconsole @AndyDufresne? – Gray Oct 10 '12 at 12:19
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/17813/discussion-between-andy-dufresne-and-gray) – Andy Dufresne Oct 10 '12 at 12:21
0

It depends on the toString() method that's being called. I've seen AbstractCollection.toString fall over when the String being constructed is too large for the heap. Otherwise, the problem could be in the toString of the objects in the collection.

To figure out which one it is, take some more stack dumps (10 or so). The stuck thread will probably usually be in the toString that's causing the problem.

As a quick fix, replace

logger.debug("Loaded ids as " + ids + ".");

with

logger.debug("Loaded ids as {}.", ids);

(Assuming you're using slf4j, otherwise look up the appropriate way to do parameterized logging in your framework).

This will skip the toString if debug is not enabled.

artbristol
  • 32,010
  • 5
  • 70
  • 103
  • How does this resolve the problem with the `toString()` method? – Gray Oct 09 '12 at 13:51
  • 1
    @Gray It doesn't. But they said the problem was in production. `toString` is mainly for developers and shouldn't usually get called in production IMHO. – artbristol Oct 09 '12 at 13:52
  • Really? `toString()` gets called all of the time for production logging in our environment. Web applications call `toString()` everywhere on container objects when rendering views. – Gray Oct 09 '12 at 13:53
  • @artbristol Glad to see MHO reflected in YHO. A lot of guys "win" discussions about crazy code by arguing toString is not an object dumper, but a GUI filler :( – TheBlastOne Oct 09 '12 at 13:54
  • @Gray doh! You guys are one of those guys! :-O – TheBlastOne Oct 09 '12 at 13:54
  • I don't write the frameworks @TheBlastOne. I just use them on TV. ;-) – Gray Oct 09 '12 at 13:55
  • @Gray but you use it, and now you´ll have a hard time to debug it. Unfortunately, this widespread use of toString for presentation purposes (besides for logging) in combination with some strange occasions where one never would think of toString get called -admittedly usually hidden deeply in frameworks- makes this a rather nasty debug scenario. – TheBlastOne Oct 09 '12 at 13:59
  • 1
    We use log4j and this would mean we would have add a if check to see if the log level is debug before logging the statement. But as Gray said this doesn't seem to do anything with the problem. The software worked fine in further runs of the process. We are trying to understand how the root cause behind the above rare jvm hung. – Andy Dufresne Oct 09 '12 at 14:00
  • @AndyDufresne I've updated my answer with some hopefully more useful advice. – artbristol Oct 09 '12 at 14:32
  • Thanks for your inputs @artbristol. As I mentioned above, the software took thread dumps every 15 mins and this was done for 24 hrs. All the thread dumps show the above thread stack trace. The thread is in the RUNNABLE state and the last method it is executing is AbstractCollection.toString(). Please have a look at my last comment on the above answer (to Gary) and let me know if you have any inputs on that. – Andy Dufresne Oct 10 '12 at 03:28