0

I've been pulled in to try to analyse gc problems for one of our installations.
We're getting OOME somewhat randomly when old gen has grown a bit large.
The total allocated heap is **220 GB**, running **JDK 8 (1.8.0_292-b10)**.
Xms is set equal to Xmx
No pre-sizing of old gen/young gen is done.
When we get OOME there's still some **90-100 GB** heap unused.
Looking the log I can see quite a few humongous allocations and I know they go straight into old gen. Once again looking at the log I see old gen steadily growing with almost no reclaim.
I don't understand why we crash when there's still so much heap left.
It seems unlikely that we are trying to allocate ¨100 GB in one go, so any suggestions are helpful. (suggestions to upgrade java version, while valid, are not helpful due to the customer)
Erik
  • 2,013
  • 11
  • 17
  • `looking at the log I see old gen steadily growing with almost no reclaim.` This is known behavior for JVMs. If there's plenty of memory remaining, no need to garbage collect. I can't explain the OOM error though. – markspace Nov 18 '21 at 15:53
  • Default value of `NewRatio` is `2`, which means that only 2/3 of heap size is allowed for old gen. So you don't have to try to allocate 100GB extra, but much less. – Mirek Pluta Nov 18 '21 at 15:59
  • 1
    _exactly_ what `OOME` message u get? and some gc logs when that happens would help also – Eugene Nov 18 '21 at 16:06
  • @Eugene Frustratingly enough, the client/customer won't tell us nor are we allowed to see the log file. It took us a great deal of work to get the gc log file. I would love to put the gc log file out there, I just don't know where to put it. – Erik Nov 18 '21 at 19:17
  • @markspace the gc log shows remark- and concurrent cleanup phases so some attempts at collecting is going on. But the result is small. – Erik Nov 18 '21 at 19:20
  • Ask the customer to extract the exceptions from their logs and provide that so you can actually rule out huge allocations. The exception message is important since there are different kinds of OOME. – the8472 Nov 18 '21 at 20:49
  • @the8472The gc log shows humongous allocations so I do think that's part of the problem. – Erik Nov 19 '21 at 07:22
  • @Eugene I'm not allowed to connect to file sharing hosts so... – Erik Nov 19 '21 at 07:24
  • 1
    Just take the last log entries, right before the failure and including the failure if there’s a corresponding entry, and include them directly in your question. Mind that certain OOMEs are entirely independent of the heap state, attempting to create an array with 2³¹ entries is not supported on most JVMs, a native memory (e.g. direct buffer) allocation may fail, etc. In some cases, e.g. `ArrayList l = new ArrayList<>(Collections.nCopies(20, null)); l.addAll(Collections.nCopies(Integer.MAX_VALUE-10, null));`, there’s not even an actual allocation attempt, as the length would overflow… – Holger Nov 19 '21 at 08:10

0 Answers0