Why would a java heap allocation resizing cause an OOME?

Question

We are seeing OutOfMemoryExceptions in our logs, and they seem to coincide with the java heap commit size growing from ~1G to ~2.4G. Despite the error message, it does not appear that we are running out of heap space. Aside from the exception being thrown (and heap dump generated), the resize seems to eventually succeed and the application continues on without a problem (with a ~2.4G heap commit size).

Here's an example of the log output:

INFO   | jvm 1    | 2013/08/16 12:08:05 | [GC [PSYoungGen: 328000K->2997K(339200K)] 645686K->320683K(1038272K), 0.0101580 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 
INFO   | jvm 1    | 2013/08/16 12:09:14 | [GC [PSYoungGen: 331509K->3487K(338816K)] 649195K->322153K(1037888K), 0.0115600 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
INFO   | jvm 1    | 2013/08/16 12:09:59 | [GC [PSYoungGen: 331999K->2928K(340032K)] 650665K->322608K(1039104K), 0.0099300 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
INFO   | jvm 1    | 2013/08/16 12:10:48 | [GC [PSYoungGen: 333104K->2723K(339648K)] 652784K->323240K(1038720K), 0.0100130 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
INFO   | jvm 1    | 2013/08/16 12:11:28 | [GC [PSYoungGen: 332885K->3884K(340864K)] 653402K->325089K(1039936K), 0.0106250 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
INFO   | jvm 1    | 2013/08/16 12:11:39 | [GC [PSYoungGen: 23694K->463K(340352K)] 344899K->323656K(2437504K), 0.0070330 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 
INFO   | jvm 1    | 2013/08/16 12:11:39 | [GC [PSYoungGen: 463K->0K(340608K)] 323656K->323592K(2437760K), 0.0044440 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
INFO   | jvm 1    | 2013/08/16 12:11:39 | [Full GC
INFO   | jvm 1    | 2013/08/16 12:11:40 |  [PSYoungGen: 0K->0K(340608K)] [PSOldGen: 323592K->323592K(699072K)] 323592K->323592K(1039680K) [PSPermGen: 159297K->159297K(262144K)], 1.2076900 secs] [Times: user=1.20 sys=0.00, real=1.21 secs] 
INFO   | jvm 1    | 2013/08/16 12:11:40 | [GC [PSYoungGen: 0K->0K(340736K)] 323592K->323592K(2437888K), 0.0046330 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 
INFO   | jvm 1    | 2013/08/16 12:11:40 | [Full GC
INFO   | jvm 1    | 2013/08/16 12:11:42 |  [PSYoungGen: 0K->0K(340736K)] [PSOldGen: 323592K->279953K(744512K)] 323592K->279953K(1085248K) [PSPermGen: 159297K->159062K(262144K)], 1.7593100 secs] [Times: user=1.75 sys=0.00, real=1.76 secs] 
INFO   | jvm 1    | 2013/08/16 12:11:42 | java.lang.OutOfMemoryError: Java heap space
INFO   | jvm 1    | 2013/08/16 12:11:42 | Dumping heap to java_pid28908.hprof ...
INFO   | jvm 1    | 2013/08/16 12:11:48 | Heap dump file created [463314899 bytes in 6.037 secs]
INFO   | jvm 1    | 2013/08/16 12:12:36 | [GC [PSYoungGen: 331840K->6044K(352192K)] 611793K->285998K(2449344K), 0.0164060 secs] [Times: user=0.02 sys=0.00, real=0.02 secs] 
INFO   | jvm 1    | 2013/08/16 12:13:28 | [GC [PSYoungGen: 352156K->6161K(364160K)] 632110K->286114K(2461312K), 0.0152330 secs] [Times: user=0.02 sys=0.01, real=0.01 secs] 
INFO   | jvm 1    | 2013/08/16 12:14:47 | [GC [PSYoungGen: 364113K->6575K(374144K)] 644066K->288169K(2471296K), 0.0179930 secs] [Times: user=0.02 sys=0.01, real=0.02 secs]

Notice that immediately prior to the OOME, the total committed heap oscillates between 1GB and 2.4GB. We can see that it was pretty steady at 1GB beforehand and pretty steady at 2.4GB afterwards.

Among the javaopts for this 1.6.0._24 JVM are:

-Xmx3072m
-XX:+HeapDumpOnOutOfMemoryError
-XX:-UseGCOverheadLimit
-verbose:gc
-Xss256k
-XX:MaxPermSize=256m
-server
-XX:+PrintGCDetails

The JVM is running 1.6.0._24. We can't change versions now, but will have a window to do so in the next month or two. If 1.6.0_45 is that much more stable, we will aim to switch to that. We're currently testing it.

The machine only has 4GB of total system ram. In addition, there is a small RAM-disk in use as well. I'm concerned that the Xmx setting is already too high for this environment.

This is puzzling us, as it doesn't seem like the heap usage is very big at the time of the exception. Why are we getting this OOME?

UPDATE: We are attempting to prevent this condition by setting the initial memory (Xms) equal to the maximum memory (Xmx). So far, these experiments have been promising though we've yet to introduce the change in production. It still doesn't explain why the OOME occurred in the first place, though it does seem to indicate that the OOME can be avoided without increasing the max heap size (or reducing the app memory footprint). So the mystery remains of why the heap resizing caused an OOME?

How reproducible is it? Can you tell us a bit about the area in the code where it happens? — nanofarad, Aug 15 '13 at 15:18
We can't reproduce it in our test environment so far, though we are still trying. It happens every few days in production. From what I can tell, it seems to be based on heap growth in general and not any particular piece of our code. — bischoje, Aug 22 '13 at 21:14

Peter Lawrey · Answer 1 · 2013-08-23T07:57:04.760

1

For reading you logs, it appear you have a very large burst of activity, most like objects large enough to go straight into tenured/old gen. I still suggest you increase you max memory to see how your application behaves as the OOME is likely to give you confusing stats.

This suggests heavy early promotion. The "GC" is a minor collection which it appears that every object is needed, triggering a Full GC which finds some tenured objects which can be dropped. The GC works best when the young object die in the Eden space, but it appears most of your objects are dying in tenured space.

One way to test this is to make the max heap space much larger. If you can try a 24 GB heap or 80% of you main memory see how it behaves then. e.g. try -Xmx24g if you have 32 GB of memory. From these numbers, it appears you want an Eden size of at least 5 GB.

If this is not an option, I suggest you use a memory profiler to reduce the memory consumption by a factor of at least 3x.

I would check you have a recent version of Java 6 like update 45. There were significant performance improvement between update 18 and 26.

edited Aug 23 '13 at 07:57

answered Aug 15 '13 at 15:20

Peter Lawrey

525,659
79
751
1,130

I rewrote the question based on what we've learned in the last week. Perhaps you could rewrite or add to your answer, now that I've narrowed the focus of the question? – bischoje Aug 22 '13 at 21:15
Okay, we'll try a larger heap size in our test environment. I am concerned that it will lead to SIGKILLs from the OS due to only having 4GB RAM on the box, but we will try it. – bischoje Aug 23 '13 at 14:22
I suggest buying more memory. You can buy 32 GB for well under $1000. – Peter Lawrey Aug 23 '13 at 14:27
This application is running on embedded hardware, and there are over a hundred of them out in the field. So you think the OOME is obscuring the actual peak memory usage? – bischoje Aug 23 '13 at 20:05
Yes. If you have limit memory consumption I suggest you memory profile the application to reduce it's consumption. You should be able to half its usage or better if you haven't done this before. – Peter Lawrey Aug 24 '13 at 07:29
We have added memory to our new models: 8 GB instead of 4 GB. Even without this change, we've seen the problem cease when we set Xms=Xmx. When we do that, our heap starts out big, stays big, and we avoid the heap resizing issue altogether. – bischoje Dec 05 '13 at 17:07
@bischoje Now your system is behaving sanely, you can try profiling it and reducing consumption and pressure on the GC. If you haven't done this before there many be some quick wins to improve its behaviour. – Peter Lawrey Dec 05 '13 at 17:22

Why would a java heap allocation resizing cause an OOME?

1 Answers1