3

I understand that a larger heap means longer GC pauses. I'm okay with that -- my code is doing analysis of some data, and all I care about is minimizing the time spent doing garbage collection, the length of a single pause doesn't make a difference to me.

Can making the heap too large hurt performance? My understanding is that "young" objects get GC'd quickly, but "old" objects can take longer, so my worry is that a large heap will push some short-lived objects into the longer-lived space. I do a lot of allocation of strings that get thrown away quickly (on the order of 60 GB over the course of a single run) and so I don't want to increase GC time spent on those.

I'm testing on a machine with 8 gb of RAM, so I've been running my code with -Xms4g -Xmx4g, and as of my last profiled run, I spent about 20% of my runtime doing garbage collection. I found that increasing the heap to 5 gb helped reduce it. The production server will have 32 gb of RAM, and much higher memory requirements.

Can I safely run it with -Xms31g -Xmx31g, or might that end up hurting performance?

Community
  • 1
  • 1
Patrick Collins
  • 10,306
  • 5
  • 30
  • 69
  • I suggest running a benchmark. – Karol S Aug 07 '14 at 21:32
  • The highest you could put the `-Xm*` flags at is half the available system memory, so "no"... – Makoto Aug 07 '14 at 21:32
  • @Makoto Is that true? I've set `-Xm*5g` on my machine (with 8 GB of RAM) with no issue -- can you provide a source? – Patrick Collins Aug 07 '14 at 21:47
  • Since when is half the available system memory a limit? I don't ever remember having problems. However this question is more about advanced GC tuning, which involves much more than `-Xms` and `-Xmx`. – Kayaman Aug 07 '14 at 21:48

1 Answers1

8

Can making the heap too large hurt performance?

When you go over 31 GB you can lose CompressedOops which can mean you have to jump to 48 GB just to get more usable memory. I try to keep under 31 GB if I can.

My understanding is that "young" objects get GC'd quickly, but "old" objects can take longer, so my worry is that a large heap will push some short-lived objects into the longer-lived space.

For this reason I tend to have large young generations, e.g. up to 24 GB.

Can I safely run it with -Xms31g -Xmx31g, or might that end up hurting performance?

On a 32 GB machine this would be very bad. By the time you include the off heap the JVM uses, the OS, the disk cache, you are likely to find that a heap over 24-28 GB will hurt performance. I would start with 24 GB and see how that goes, you might find you can reduce it will little effect if 5 GB runs ok now.

You might find moving your data off heap will help GC times. I have run systems with 1 GB heap and 800 GB off heap, but it depends on your applications requirements.

I spent about 20% of my runtime doing garbage collection

I suggest you reduce your allocation rate. Using a memory profiler you can reduce your allocation rate to below 300 MB/s, but less than 30 MB/s is better. For an extreme system you might want less than 1 GB/hour as this would allow you to run all day without a minor collection.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • Do you suggest 24 GB as a solid cap, or are you quoting it as a percentage of the machine's total memory? – Patrick Collins Aug 07 '14 at 22:19
  • 1
    @PatrickCollins I'd bet he means that you should leave 4-8 GB free for other uses (listed above), otherwise you risk swapping, which cost way much more than you can gain by having a bit larger heap. So it's neither absolute nor relative cap. – maaartinus Aug 08 '14 at 00:02
  • @maaartinus Well, I'd call that relative -- relative to the available memory I'm working with -- rather than stemming from a limitation of the JVM, where it's the fact that it's exactly 24 GB that's the worry. I'm doing testing now on a machine with 60 GB and it looks like even 48 GB isn't enough to cut it. I understand the dangers of ending up in a fight with the swapfile. – Patrick Collins Aug 08 '14 at 01:33
  • @PatrickCollins Yes that is why I started by saying `On a 32 GB machine ...` – Peter Lawrey Aug 08 '14 at 18:33
  • @PatrickCollins It is worse than you might expect, if your heap start swapping the program will become unusable, possibly the system will become unusable and on Windows it can mean having to power cycle the box because you can't even kill the process. – Peter Lawrey Aug 08 '14 at 18:34
  • @PatrickCollins If you are using more than 8 GB you should seriously concider a) using a memory profiler to help you reduce it b) use off heap memory so that things like paging work reasonably well. In that case, I suggest getting a fast SSD. – Peter Lawrey Aug 08 '14 at 18:35
  • @PeterLawrey I'm doing some fairly large-scale data analysis, I've had to make a number of tradeoffs between memory and CPU cycles to get the runtime on this dataset down from 9 days to an hour. I think it's inevitable given what I'm working with. I don't think I can use off-heap memory because all 48 GB need to be garbage-collected every 5 minutes or so. It looks like off-heap memory is also expensive in terms of access time. – Patrick Collins Aug 08 '14 at 22:10
  • It appears from profiling, anyway, that my memory requirements are much larger than I realized and so it's a moot point whether I'm getting a performance hit from 24 GB vs 32 GB. Thanks for the info about the JVM, though. – Patrick Collins Aug 08 '14 at 22:14
  • @PatrickCollins You should reduce your heap requirement. Like I said you can have an application which has 1 GB of heap and 800 GB of off heap. There is open source tools to help you do this http://openhft.net/products/chronicle-queue/ and http://openhft.net/products/chronicle-map/ – Peter Lawrey Aug 09 '14 at 08:21
  • @PatrickCollins You don't have to use these tools, here is an example I did for Minecraft where I reduced the heap by 80% by changing a few key classes http://vanillajava.blogspot.co.uk/2014/06/minecraft-and-off-heap-memory.html – Peter Lawrey Aug 09 '14 at 08:23
  • @PeterLawrey That library looks really slow. Writing memory to disk gets you all the speed of writing to swap, "fine grained locking" prevents multithreaded access to objects off-heap.... – Patrick Collins Aug 10 '14 at 06:54
  • @PatrickCollins Using shared memory (even when persisted) can out perform System V IPC by a factor of 8x. You can pass information between two processes via shared memory in 40 nano-seconds. If that sounds slow, I would be interested in something faster ;) – Peter Lawrey Aug 10 '14 at 07:06
  • @PeterLawrey IPC is irrelevant for a single-process application. If something is taking my data out of memory and writing it to disk, it's going to take me a disk-read amount of time to get the data back in memory, and that's *slow*. I also need multithreaded access to these objects, which this library seems to forbid. – Patrick Collins Aug 10 '14 at 07:19
  • @PatrickCollins Like I said, you can update a record and pick it up in another thread/process in 40 ns. I don't think that is slow. If you write something to memory and it is still in memory, why would there is a disk-read? – Peter Lawrey Aug 10 '14 at 08:40