When a JVM crashes (segfaults) during garbage collection, how can I find out what was being collected?

Question

I get segfaults in my JVM at roughly the same phase of the application, but with varying stack traces in the crash report. It always seems to happen during GC, however.

Since the crash happens in all three JVMs I tried (OpenJDK 6, Oracle 1.6.0_25 and 1.7.0) and with two GCs each (Parallel Collector and CMS), and it happens around the same area in the application, I figured, if I could find what the GC was trying to collect, I might spot some peculiarity in my code that causes this crash.

Are there any coding practices that are well known to be problematic for GC?
What methods are available for diagnosing this problem?
Can I make any educated guesses on where in my application this problem is triggered?
What (GC tuning) parameters can I play with to narrow the problem down?
Is there a way to spot (possibly) problematic data in a heap dump?

Added the JNI tag because it's quite clearly a bug in some JNI library as Peter Lawrey has already pointed out. — Voo, Nov 11 '11 at 16:52
I would be very interested in seeing what was causing the crash. — Casey, Nov 11 '11 at 16:53
It's Eclipselink, anyway. I just know. It has to go. It had it coming for a long time. I'll take it out myself. — Hanno Fietz, Nov 11 '11 at 17:05
@Casey Sure you can, but how likely is a bug in the JVM implementation, compared to some corrupting JNI function? — Voo, Nov 11 '11 at 17:10
this might help http://stackoverflow.com/questions/5395337/throw-exception-when-java-jni-experiences-segmentation-fault — Prasanna Talakanti, Nov 11 '11 at 20:09

score 8 · Answer 1 · answered Nov 11 '11 at 16:45

8

This will happen if you have JNI library which handles memory incorrectly. The problem does not show immediately. However when a GC is performed, it scans all the memory, trips over the corrupted reference and kills the JVM. i.e. the corruption could have occurred at any time since the last Full GC.

answered Nov 11 '11 at 16:45

Peter Lawrey

525,659
79
751
1,130

Sounds hard to debug. Anything I can do at all? – Hanno Fietz Nov 11 '11 at 16:46
So, even though the problem seems to happen during GC, that does not narrow it down to a bug in a finalizer? – Raedwald Nov 11 '11 at 16:52
@HannoFietz Well there are Valgrind and Co, but the problem is that the Hotspot code itself is.. well it breaks most (I'd say all, but who knows) of those tools. No idea if you can get valgrind to only check your own code. – Voo Nov 11 '11 at 16:54
1

@RaedWald You CAN'T produce a segfault in pure Java code without a bug in the underlying JVM (which is possible, but unlikely). The GC has to look (and traverse) all pointers, so if you write garbage into one of those it'll trip up, that's all there is to it. – Voo Nov 11 '11 at 16:55
Is there a way I can recognize the problematic data in a heap dump? I would probably be able to make one that is likely to contain it. – Hanno Fietz Nov 11 '11 at 16:56
"You CAN'T produce a segfault in pure Java code" I know. I should have said "a bug in a finalizer that calls JNI code". – Raedwald Nov 11 '11 at 16:58
@Readwald Well ok, yes we can't narrow it down. All we know: Some JNI code was called that corrupted the heap, where this happened is immaterial (and can't be guaranteed - could be a finalizer, could be anything else). Actually finalizers aren't even executed while GC: They are run in a separate thread at the same time as the rest of the program. – Voo Nov 11 '11 at 17:06
@Voo, you can get a seg fault using sun.misc.Unsafe. You can also corrupt memory so it fails on the next gc. I am not sure how you can corrupt memory so that that the GC doesn't crash, but the finalizer does. If you corrupted the ReferenceQueue the finalizer uses you might be able to do it. – Peter Lawrey Nov 11 '11 at 17:56

score 1 · Answer 2 · edited Oct 19 '12 at 21:02

1

We were also facing similiar issue. There was no pattern we could see and it was quite random but happening either on GC or Full GC. For us it turned out to be an issue with the RAM modules. We identified it using MemTest86+ on the Ubuntu server.

edited Oct 19 '12 at 21:02

Igor

33,276
14
79
112

answered May 21 '12 at 05:42

VIRAL SHAH

11
1

r0ast3d · Answer 3 · 2011-11-11T16:49:09.687

seg faults have specific error codes at the beginning of the dump http://en.wikipedia.org/wiki/Segmentation_fault
You can use Thread.dumpStackTrace to see what is going on in that application If you know exactly where your application is freezing or going to freeze after a certain action or event you can CTRL + break windows or CTRL + \ to get a thread dump and see what is going on.
Instead of vaguely guessing you can comment out certain sections of the code to find out which loop or object or buffer or string is taking too long
depending upon your situation you can consider some specific tools.

score 0 · Answer 4 · answered Nov 11 '11 at 16:58

I suggest you get both Thread dump and Heap dump, You can do this either from command line are use a tool like Visual VM
I think Heap dump being snap shot of JVM memory will provide information about live objects and their allocations. If you analyze the heap using Visual VM it does provide a detailed report on all the objects on heap
I would suggest you bump of the GC collection on your application to verbose and analyze them using a tool like tagtraum
If you can attach a JVM profiler that can be provide a lot of information or If you have general idea of the work flow that is causing the problem then just profile that in isolation

When a JVM crashes (segfaults) during garbage collection, how can I find out what was being collected?

4 Answers4