4

Is there a way to resolve this kind of error report:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fc955e66998, pid=25851, tid=140467030525696
#
# JRE version: 6.0_37-b06
# Java VM: Java HotSpot(TM) 64-Bit Server VM (20.12-b01 mixed mode linux-amd64 compressed     oops)
# Problematic frame:
# J  java.util.LinkedHashMap.addEntry(ILjava/lang/Object;Ljava/lang/Object;I)V

?

The crash occurs quite frequently (1-2 times per day in web server production), almost always with different problematic frame report.

Here are examples of some error reports:

# J  java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter()Ljava/util/concurrent/locks/AbstractQueuedSynchronizer$Node;
# J  java.util.LinkedHashMap.addEntry(ILjava/lang/Object;Ljava/lang/Object;I)V
# C  [libc.so.6+0x6bb34]
# C  [libgobject-2.0.so.0+0x2346f]  g_type_check_instance_is_a+0x43
# C  [libgobject-2.0.so.0+0x2346f]  g_type_check_instance_is_a+0x43
# V  [libjvm.so+0x4d3360]
# V  [libjvm.so+0x32d166]  CardTableRS::write_ref_field_gc_par(void*, oopDesc*)+0x26
# V  [libjvm.so+0x7a33e2]  ContiguousSpace::prepare_for_compaction(CompactPoint*)+0x242
# V  [libjvm.so+0x4d3360]
# V  [libjvm.so+0x76943b]  ReferenceProcessor::balance_queues(DiscoveredList*)+0x32b
# V  [libjvm.so+0x4d3360]
# V  [libjvm.so+0x32d166]  CardTableRS::write_ref_field_gc_par(void*, oopDesc*)+0x26
# V  [libjvm.so+0x4d3360]
# V  [libjvm.so+0x4d3360]
# V  [libjvm.so+0x76943b]  ReferenceProcessor::balance_queues(DiscoveredList*)+0x32b

The only thing that seems to trigger the crashes is high memory usage approx 30gb, even though that has not always been the case (there are some crashes at instants where gc log shows low memory usage). The crashes do not occur when running in -Xint mode, but that mode is so slow that it is not an option.

Seems to be difficult to make any simple 'reproduceable code' to reproduce the error that occurs in production environment of a complex app.

What to do? I did report a bunch of these at the Oracle crash site though ...

I do not suspect hardware memory problems because nothing else ever crashes except java. And there is no custom native jni code in the application.

Our vm parameters are -server -Xss4096k -Xms32255M -Xmx32255M -Xnoclassgc -XX:+UseNUMA -XX:MaxPermSize=512m -XX:+UseGCOverheadLimit -verbose:gc -Xmaxf1 -XX:+UseCompressedOops -XX:+DisableExplicitGC -XX:+AggressiveOpts -XX:+ScavengeBeforeFullGC -XX:CMSFullGCsBeforeCompaction=10 -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:GCTimeRatio=19 -XX:+UseAdaptiveSizePolicy -XX:MaxGCPauseMillis=500 -Xloggc:gc.log.

Martin
  • 1,385
  • 15
  • 21

4 Answers4

0

While it is possible that the crash is caused by a JVM bug, it is more likely to be caused by some JNI / JNA native code that you have written, or that is part of some 3rd-party library that you are using.

What to do?

Here is a blog on the topic of how to get started with debugging a crash dump: http://www.javacodegeeks.com/2012/01/debugging-jvm.html

In your case, the fact that reports are all different is going to make the problem harder to track down. It sounds like you may have a problem with something "randomly" corrupting heap objects.

I did report a bunch of these at the Oracle crash site though ...

Unless you have a support contract, Oracle are unlikely to get back to you with a solution.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • There is no custom native jni code in the application. Thanks for the link; it will be hard to debug the production app because it might crash randomly during the day and it should then be restarted immediately. Is there anything similar for windows? I could crash it easily there ;) – Martin Oct 21 '12 at 13:08
0

If the crashes are frequent with apparently random causes then I'd be thinking in terms of a possible hardware problem (e.g. dodgy RAM). I'd be inclined to run a full battery of hardware diagnostics on the server and see if that throws up anything.

Matt
  • 8,367
  • 4
  • 31
  • 61
  • It occurs on both Windows 7 64bit (more frequently / more sensitive) and Linux 64bit server. I do not suspect hardware memory problems because nothing else ever crashes except java. – Martin Oct 21 '12 at 13:03
  • any correlation to either the load on the process or the type of work it is doing? any correlation to particular JIT activity? (I've seen crashes that happen only after particular methods have compiled before). Have you tried with a different GC algorithm? – Matt Oct 21 '12 at 19:12
  • Our vm parameters are `-server -Xss4096k -Xms32255M -Xmx32255M -Xnoclassgc -XX:+UseNUMA -XX:MaxPermSize=512m -XX:+UseGCOverheadLimit -verbose:gc -Xmaxf1 -XX:+U seCompressedOops -XX:+DisableExplicitGC -XX:+AggressiveOpts -XX:+ScavengeBeforeF ullGC -XX:CMSFullGCsBeforeCompaction=10 -XX:CMSInitiatingOccupancyFraction=70 -X X:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+CMSIncrement alPacing -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:GCTimeRatio=19 -XX:+UseAdaptiveSizePolicy -XX:MaxGCPauseMillis=500 -Xloggc:gc.log` – Martin Oct 22 '12 at 02:20
  • How to monitor 'particular JIT activity'? I guess it might be related to GC, but we seem to need them to cope with garbage so very careful to try other algorithms at this point ;) – Martin Oct 22 '12 at 02:20
  • JIT activity can be seen using `-XX:+PrintCompilation` or [LogCompilation](https://wikis.oracle.com/display/HotSpotInternals/LogCompilation+overview), the former doesn't give timestamps so you need to write something to watch it so you know when it is doing something – Matt Oct 22 '12 at 09:56
  • btw, incremental CMS is only recommended for single core systems (which I assume you don't have given that you have a 32G heap), some [details here](http://markmail.org/message/r7dfk2zjeydx5aif) – Matt Oct 22 '12 at 10:01
  • It crashes on win7 with 1 core and Linux 4 cores. Do you have reference that CMSIncrementalMode is not recommended for multi cores? My recent crash is `EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x000000006d8c3215, pid=57148, tid=61948 # # JRE version: 6.0_37-b06 # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.12-b01 mixed mode windows-amd64 compressed oops) # Problematic frame: # V [jvm.dll+0xc3215]` and hotspot compilation log doesn't look suspicious. – Martin Oct 23 '12 at 08:25
  • link is in my previous comment (the "details here" link), basically incremental mode is a way of spreading particular phases of cms work over time to avoid it hogging the only core for a prolonged period. – Matt Oct 23 '12 at 09:13
  • I wonder what about about keeping the -XX:+CMSIncrementalPacing? – Martin Oct 23 '12 at 10:11
  • incrementalpacing is on by default if you use icms on jdk6 so you don't need it either way – Matt Oct 23 '12 at 11:20
  • Just crashed with `EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x0000000002f77b49, pid=63316, tid=54488 # # JRE version: 6.0_37-b06 # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.12-b01 mixed mode windows-amd64 compressed oops) # Problematic frame: # J java.util.LinkedHashMap.transfer([Ljava/util/HashMap$Entry;)V` on i7core machine with -CMSIncrementalMode. Only some inlining going on in hotspot. – Martin Oct 23 '12 at 12:15
  • I tried with "vanilla" GC parameters, i.e., only `-server -Xmx32255M -Xms5G -XX:MaxPermSize=512m -Xnoclassgc `, got `EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x0000000002b66d94, pid=49536, tid=63292 # # JRE version: 6.0_37-b06 # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.12-b01 mixed mode windows-amd64 compressed oops) # Problematic frame: # J net.sf.ehcache.store.compound.Segment.addRandomSample(Lnet/sf/ehcache/store/compound/ElementSubstituteFilter;ILjava/util/Collection;I)V`. – Martin Oct 23 '12 at 15:34
  • What is strange is, that `-XX:-CMSIncrementalMode` made the system VERY instable, I had to remove this option. – Martin Oct 24 '12 at 04:31
  • it seems more likely you are coming up against a possible hotspot bug, I suggest posting to the [hotspot-gc-dev](http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-dev) or [hotspot-gc-use](http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use) mailing lists. In the meantime (and this is guessing) you could try tweaking some of the params to determine if you're hitting a limit of some sort (e.g. make heap less than 26G to change the way compressedoop addressing works, reduce stack size) – Matt Oct 24 '12 at 08:10
0

I found this article on the web ` If you use the Java™ virtual machine (JVM) AggressiveOpts option with a Java Platform Enterprise Edition (Java EE) application that contains Enterprise JavaBeans (EJB) files, the JVM might crash. To work around this issue, disable the DoEscapeAnalysis optimization using the following arguments:

-XX:+AggressiveOpts -XX:-DoEscapeAnalysis`:

http://www-01.ibm.com/support/docview.wss?uid=swg21422605

What is strange is, that -XX:-CMSIncrementalMode made the system VERY instable, I had to enable this option.

Martin
  • 1,385
  • 15
  • 21
0

Upgraded to jdk7 Java(TM) SE Runtime Environment (build 1.7.0_09-b05) and haven't had any problems since; follwing vmargs:

-server -Xss4096k -XX:MaxPermSize=512m -Xms32255M -Xmx32255M -Xnoclassgc -XX:+UseNUMA -XX:+UseBiasedLocking -XX:+UseFastAccessorMethods -XX:ReservedCodeCacheSize=48m -XX:+UseStringCache -XX:+HeapDumpOnOutOfMemoryError -XX:+UseGCOverheadLimit -Duser.timezone=EET -Xmaxf1 -XX:+UseCompressedOops -XX:+DisableExplicitGC -XX:+AggressiveOpts -XX:CMSInitiatingOccupancyFraction=70 -XX:+ParallelRefProcEnabled -XX:+UseAdaptiveSizePolicy -XX:MaxGCPauseMillis=100 -XX:+UseG1GC -XX:GCPauseIntervalMillis=3000 -XX:+PrintGCDetails -XX:+PrintHeapAtGC -Xloggc:gc.log
Martin
  • 1,385
  • 15
  • 21