7

Can anyone please help me to identify the exact are of problem. Is it either JVM, Log4j or something else in our application?

We are running a multi-threaded application using JDK 1.6.0.24 on Solaris 10 (SUNW,Sun-Fire-V240) Server. Which has RMI call to communicate to the client.

Our application went hanged.I have seen the below OutOfMemory in threaddump. However, I know it is because of GC is only able to claim 2% of the object memory.

  # java.lang.OutOfMemoryError: GC overhead limit exceeded
    Heap
     PSYoungGen      total 304704K, used 154560K [0xe0400000, 0xfbc00000, 0xfbc00000)
      eden space 154560K, 100% used [0xe0400000,0xe9af0000,0xe9af0000)
      from space 150144K, 0% used [0xf2960000,0xf2960000,0xfbc00000)
      to   space 145856K, 0% used [0xe9af0000,0xe9af0000,0xf2960000)
     PSOldGen        total 897024K, used 897023K [0xa9800000, 0xe0400000, 0xe0400000)
      object space 897024K, 99% used [0xa9800000,0xe03ffff0,0xe0400000)
     PSPermGen       total 28672K, used 27225K [0xa3c00000, 0xa5800000, 0xa9800000)
      object space 28672K, 94% used [0xa3c00000,0xa5696580,0xa5800000)

In my case it should be because GC can not claim memory from lots of waiting threads. If I see the Thread Dump. Most of the thread are waiting to acquire the lock on org.apache.log4j.Logger. Using log4j-1.2.15

If you see the first thread's trace below. It acquires lock on 2 objects and other threads (~50) are waiting to get the lock. Almost same trace can be seen for 20 minutes.

Here is the thread dump:

     pool-3-thread-51" prio=3 tid=0x00a38000 nid=0xa4 runnable [0xa0d5f000]
        java.lang.Thread.State: RUNNABLE
       at java.text.DateFormat.format(DateFormat.java:316)
      at org.apache.log4j.helpers.PatternParser$DatePatternConverter.convert(PatternParser.java:443)
    at org.apache.log4j.helpers.PatternConverter.format(PatternConverter.java:65)
    at org.apache.log4j.PatternLayout.format(PatternLayout.java:506)
    at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:310)
    at org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.java:276)
    at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
    at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
    - locked  (a org.apache.log4j.RollingFileAppender)
    at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
    at org.apache.log4j.Category.callAppenders(Category.java:206)
    - locked  (a org.apache.log4j.Logger)
    at org.apache.log4j.Category.forcedLog(Category.java:391)
    at org.apache.log4j.Category.info(Category.java:666)
    at com.airvana.faultServer.niohandlers.NioNotificationHandler.parseAndQueueData(NioNotificationHandler.java:296)
    at com.airvana.faultServer.niohandlers.NioNotificationHandler.messageReceived(NioNotificationHandler.java:145)
    at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:105)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:567)





"Timer-3" prio=3 tid=0x0099a800 nid=0x53 waiting for monitor entry [0xa1caf000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:231)
    - waiting to lock  (a org.apache.log4j.RollingFileAppender)
    at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
    at org.apache.log4j.Category.callAppenders(Category.java:206)
    - locked  (a org.apache.log4j.spi.RootLogger)
    at org.apache.log4j.Category.forcedLog(Category.java:391)
    at org.apache.log4j.Category.info(Category.java:666)
    at com.airvana.controlapp.export.AbstractOMDataCollector.run(AbstractOMDataCollector.java:100)
    at java.util.TimerThread.mainLoop(Timer.java:512)
    at java.util.TimerThread.run(Timer.java:462)



"TrapHandlerThreadPool:Thread-10" prio=3 tid=0x014dac00 nid=0x4f waiting for monitor entry [0xa1d6f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:231)
    - waiting to lock  (a org.apache.log4j.RollingFileAppender)
    at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
    at org.apache.log4j.Category.callAppenders(Category.java:206)
    - locked  (a org.apache.log4j.Logger)
    at org.apache.log4j.Category.forcedLog(Category.java:391)
    at org.apache.log4j.Category.info(Category.java:666)
    at com.airvana.faultServer.db.ConnectionPool.printDataSourceStats(ConnectionPool.java:146)
    at com.airvana.faultServer.db.SQLUtil.freeConnection(SQLUtil.java:267)
    at com.airvana.faultServer.db.DbAPI.addEventOrAlarmOptimized(DbAPI.java:904)
    at com.airvana.faultServer.eventProcessing.EventProcessor.processEvent(EventProcessor.java:24)
    at com.airvana.faultServer.filters.BasicTrapFilter.processTrap(BasicTrapFilter.java:80)
    at com.airvana.faultServer.eventEngine.EventEngine.notifyTrapProcessors(EventEngine.java:314)
    at com.airvana.faultServer.eventEngine.NodewiseTrapQueue.run(NodewiseTrapQueue.java:94)
    at com.airvana.common.utils.ThreadPool$PoolThread.run(ThreadPool.java:356)


"RMI TCP Connection(27927)-10.193.3.41" daemon prio=3 tid=0x0186c800 nid=0x1d53 waiting for monitor entry [0x9f84e000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:231)
    - waiting to lock  (a org.apache.log4j.RollingFileAppender)
    at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
    at org.apache.log4j.Category.callAppenders(Category.java:206)
    - locked  (a org.apache.log4j.Logger)
    at org.apache.log4j.Category.forcedLog(Category.java:391)
    at org.apache.log4j.Category.info(Category.java:666)
    at com.airvana.faultServer.processCommunications.ConfigAppCommReceiver.sendEvent(ConfigAppCommReceiver.java:178)
    at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
    at sun.rmi.transport.Transport$1.run(Transport.java:159)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
    at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)

"pool-3-thread-49" prio=3 tid=0x01257800 nid=0xa1 waiting for monitor entry [0xa0def000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at org.apache.log4j.Category.callAppenders(Category.java:204)
    - waiting to lock  (a org.apache.log4j.Logger)
    at org.apache.log4j.Category.forcedLog(Category.java:391)
    at org.apache.log4j.Category.info(Category.java:666)
    at com.airvana.faultServer.niohandlers.NioNotificationHandler.processSeqNumber(NioNotificationHandler.java:548)
    at com.airvana.faultServer.niohandlers.NioNotificationHandler.parseAndQueueData(NioNotificationHandler.java:301)
    at com.airvana.faultServer.niohandlers.NioNotificationHandler.messageReceived(NioNotificationHandler.java:145)
    at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:105)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:567)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:803)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:385)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:324)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:306)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:223)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:87)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:567)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:803)
    at org.jboss.netty.handler.timeout.ReadTimeoutHandler.messageReceived(ReadTimeoutHandler.java:149)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:87)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:567)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipe


"pool-3-thread-44" prio=3 tid=0x00927800 nid=0x9b waiting for monitor entry [0xa0f0f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at org.apache.log4j.Category.callAppenders(Category.java:204)
    - waiting to lock  (a org.apache.log4j.Logger)
    at org.apache.log4j.Category.forcedLog(Category.java:391)
    at org.apache.log4j.Category.info(Category.java:666)
    at com.airvana.faultServer.niohandlers.NioNotificationHandler.parseAndQueueData(NioNotificationHandler.java:296)
    at com.airvana.faultServer.niohandlers.NioNotificationHandler.messageReceived(NioNotificationHandler.java:145)
    at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:105)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:567)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:803)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:385)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:324)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:306)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:223)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:87)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:567)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:803)
    at org.jboss.netty.handler.timeout.ReadTimeoutHandler.messageReceived(ReadTimeoutHandler.java:149)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:87)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:567)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:803)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:385)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:324)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:306)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:221)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:87)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:567)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:803)
    at org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:76)
    at org.jboss.netty.handler.execution.OrderedMemoryAwareThreadPoolExecutor$ChildExecutor.run(OrderedMemoryAwareThreadPoolExecutor.java:314)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
Lokendra Chauhan
  • 367
  • 1
  • 6
  • 18

3 Answers3

8

An OutOfMemoryError due to GC overhead limit happens when the JVM decides that too large a percentage of time is spent running the garbage collector. This is a classic sign that the heap is nearly full.

If the heap is too full the JVM spends an increasing percentage of time garbage collecting to reclaim a decreasing amount of memory. The corresponding percentage of time left to do useful work decreases.

My hypothesis is that your loggers are backing up due to not getting enough time between GC runs to cope with the logging rate. The large number of blocked threads is thus a secondary symptom rather than the root cause of your problems.


Assuming that the above is true, the short term fix is to restart your application with JVM options for a larger heap size. You could also change the GC overhead threshold so that your application dies sooner. (This may seem strange, but it is probably better for your application to die quickly rather than grind to a standstill over a period of many minutes or hours.)

The real fix is to track down why you are running out of heap space. You need to enable GC logging and watch the memory usage trends as the application runs ... over hours, days, weeks. If you notice that the memory usage is growing in the long term, the chances are that you have a memory leak of some kind. You'll need to use a memory profiler to track it down.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
2

This problem description appears to be similar to that of bug 41214. I'm not sure if this related to your problem, but some of the elements in your posted stack trace are similar to the ones in that bug.

You might therefore, want to follow up on Stephen's advice to verify if you have too many logger calls from multiple threads leading to a lot of lock contention. Switching the level of the logger to a higher level might help, although this is not advisable if you need the log entries; I would advise that this be used after careful deliberation.

Vineet Reynolds
  • 76,006
  • 17
  • 150
  • 174
  • @Stephen, your points (the older and the newer ones) are still valid. The log4j bug being discussed could theoretically lead to a GC overhead issue. Correct me if I'm wrong, but wouldn't the CMS collector detect that it is spending too much CPU time, if the threads were in a deadlock? Given the internal implementation of some of the classes in log4j which have synchronization requirements on the caller, I would suspect this to be the case. – Vineet Reynolds Jun 01 '11 at 10:13
  • @Stephen, you do allude one point that I would re-emphasize to the OP. If he could reduce the frequency of GC invocations, then there would be a smaller problem to deal with, and for that he would have to find out why and how his heap is increasing. – Vineet Reynolds Jun 01 '11 at 10:16
  • @Vineet - Thread "pool-3-thread-51" is running, and it looks like it is currently holding the lock that the other threads are all waiting on. That is heavy lock contention, but not deadlock. – Stephen C Jun 01 '11 at 10:56
  • @Vineet - *"... wouldn't the CMS collector detect that it is spending too much CPU time, if the threads were in a deadlock?"*. If it did, that would be a bug. But I don't think this would happen anyway. If the threads were mostly deadlocked, then there'd be minimal activity to generate garbage to cause the GC to run. – Stephen C Jun 01 '11 at 11:03
  • @Stephen, you might be right about the lock contention issue. No object IDs were listed in the thread dump in the inner stack elements, and I assumed the worst. This would also lead to your second response being right. Naturally, for my claims to be correct there would have to be heavy GC in a deadlocked system, which would require a set of improbable conditions to be in place. – Vineet Reynolds Jun 01 '11 at 11:25
  • Thanks Vineet & Stephen. I don't have heap dump but definitely will reproduce it and collect the heap dump. As I said my app is using RMI. Is it true every RMI thread calls System.gc() once every minute (see http://www.tikalk.com/java/application-freezes-too-many-full-gcs). I can see I have 22 Parallel GC threads. Can RMI calls be an issue also here? As per latest we have upgraded App to JDK1.6.0.24 bcz I found a new JVM and improvements there. Will it help. Thanks in advance. – Lokendra Chauhan Jun 03 '11 at 18:53
  • @Lokesh, the RMI implementation does invoke System.gc() almost every minute; this can be controlled however using appropriate system properties as you can see in [this bug in the Sun bug database](http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4403367). If you are seeing 22 GC threads, it is likely that you are using one of the parallel scavengers; I don't think this has anything to do with the use of RMI. – Vineet Reynolds Jun 03 '11 at 19:30
  • @Vineet Sorry! I dont understand why only one parallel scavengers thread is running? parallel GC threads are proportional to the number of CPUs. I am using 32 core machine then all 22 Parallel GC threads should run. Right? – Lokendra Chauhan Jun 03 '11 at 20:48
  • @Lokendra, how have you determined that only one of the threads is involved in parallel GC? It might be true, but only if this is a major collection on the tenured generation. You'll have to use `-XX:+UseParallelOldGC` flag to use the parallel collector even for the major collection. – Vineet Reynolds Jun 04 '11 at 04:05
  • Thanks @Vineet. I am not indicating anything about one thread of GC. In my case, I am already using 22 Parallel GC threads. Is there any GC I can use similar to Parallel GC which doesn't pause the other Application Threads? – Lokendra Chauhan Jun 13 '11 at 06:34
  • All Oracle GCs pause application threads for at least part of the time. – Stephen C Feb 01 '18 at 22:26
0

So far we have not seen this issue.

I am not really sure when looks like the problems has been narrow down after upgrading JDK from JDK1.6.0.20 to JDK1.6.0.24.

Lokendra Chauhan
  • 367
  • 1
  • 6
  • 18