0

I have a 32 GB heap on a 36 core server.

Each day we get 2 ~20s Full GCs.

The default threads used in Full GC for this setup is 25:

java -XX:+PrintFlagsFinal -version| grep ParallelGCThreads
uintx ParallelGCThreads                         = 25              {product}

Which is slightly lower than the 5/8 suggested by the documentation

I can't reproduce the production load in a test environment so unfortunately I am having to tweak in production.

Assuming that this is the only/main application running on this server, is there any reason to not set the value to the number of CPUs to try to reduce the full GC time?

-XX:ParallelGCThreads=36

Are there any gains to be had in increasing beyond the number of CPUs?

e.g. would it be detrimental to double it?

-XX:ParallelGCThreads=72

NB: I am separately looking at several other options for reducing the Full GC. Therefore, I am only interested in the answer to the specific question asked and not alternative suggestions for optimising the GC.

opticyclic
  • 7,412
  • 12
  • 81
  • 155
  • 1
    Why are tagging this with `G1GC`, but bring a question in about ParallelGC? – Eugene Jan 30 '20 at 01:51
  • 1
    What specific advantage do you expect from using more threads than CPU cores? It can’t be utilizing the CPU cores, obviously. – Holger Jan 30 '20 at 09:00
  • @Eugene G1GC uses this parameter for tuning Full GC does it not? https://docs.oracle.com/cd/E40972_01/doc.70/e40973/cnf_jvmgc.htm#autoId2 – opticyclic Jan 30 '20 at 18:59
  • @Holger Whilst that is true, if one GC thread isn't using the core *constantly* there are thus periods when another thread could be using that core. – opticyclic Jan 30 '20 at 19:01
  • 1
    `Full GC` != `stop-the-world` in `G1`, only certain phases are `STW` events. Those _parallel_ (meaning parallel with your application) threads matter when your app is fully stopped. I re-read your question and only now I understood what you meant. And the answer is **No**, increasing the default usually will not help. You have to understand where the time is spent, first, in a `STW` event. – Eugene Jan 30 '20 at 19:14
  • I disagree with you. I am getting [Full GC (Allocation Failure) , 20s]. Based on that time and this Oracle link it **is** Stop The World https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/g1_gc.html – opticyclic Jan 30 '20 at 19:21
  • 1
    I see... and why exactly aren't you posting the log, but instead throw certain small pieces in here? – Eugene Jan 30 '20 at 19:25
  • Because as mentioned in the question I don't want to get bogged down in looking at other solutions for the underlying problem. This question is to specifically help me understand how the command line parameter works and not to try to fix a problem with alternative means. – opticyclic Jan 30 '20 at 19:29
  • 1
    so you do not want to understand why that pause happens, you want to throw more threads at the problem, first. well, good luck. – Eugene Jan 30 '20 at 22:37
  • 1
    @opticyclic "*if one GC thread isn't using the core constantly*" -- under which circumstances do you expect a GC thread not to use a CPU core? The GC is entirely about processing the heap memory, no I/O involved, no user interaction, no external event of any kind. Nothing a GC thread could wait for (except for the start of the GC). – Holger Jan 30 '20 at 22:46
  • @Eugene that's not the case at all! Did you even read the question? I'm separately investigating the GC. This question is just about understanding this parameter! – opticyclic Feb 03 '20 at 17:20
  • @Holger I'm not sure what the specific circumstance is. I'd expect to see 100% CPU on at least one core during the 20s of GC but I am not. Hence the question. – opticyclic Feb 03 '20 at 17:22
  • 1
    Maybe it was a good move if you posted exactly that issue you actually had, in the first place, i.e. “Why does the GC with these settings not use 100% CPU?” instead of “What Is The Impact Of Increasing…”. As long as you have less GC threads than cores, it’s not unusual, as depending on the operating system, threads are not bound to a particular core. When you have as much threads as cores, but still not 100%, it’s worth investigating the cause(s). Depending on the reason, raising the threads may be totally pointless, e.g. when they are syncing, more threads would just raise the contention. – Holger Feb 03 '20 at 19:18
  • 1
    what those threads "do" might be entire irrelevant here, but the bigger problem I have is : _I am only interested in the answer to the specific question asked and not alternative suggestions for optimising the GC_, but then _I'm separately investigating the GC._ If you present your question in a clear way - you might get clear hints or even answers! anyway... not seing 100% is probably normal, unless you can pin the GC threads to a core, and to be honest I have no idea if and how you can do that. – Eugene Feb 03 '20 at 19:20
  • 1
    the best way to see where your time is spent and **why** is to present the Full GC log in verbose mode, otherwise - you are guessing. And with `GC` this is never a good start; thus: can you show the logs? – Eugene Feb 03 '20 at 19:30
  • The reason for the large GC is due to the fragmentation of the heap from too many humongous objects being created. ` [Full GC (Allocation Failure) 16G->11G(32G), 23.7016860 secs]` . and ` allocation request: 184549392 bytes, threshold: 14495514615 bytes (45.00 %), source: concurrent humongous allocation]` . Fixing these large objects is going to rely on a third-party. This is not a "Investigate my GC" question. It is "How does this parameter work" question. – opticyclic Feb 03 '20 at 20:12
  • I've come to re-read this, and wow: `184 MB` Object... It seems there is some kind of a DB Object allocated (I can't think of any other case where such a big Object would be required), To allocate such a big Object a _contiguous_ space of regions is required, so indeed this looks like a fragmentation issue. Obviously to try and accommodate this request, a stop-the-world will happen. This is indeed where `ParallelGCThreads` may matter, there is no golden rule though: it will be a fail and try process, _usually_ if there are other processes on the box running that `5/8` is a good default. – Eugene Mar 08 '20 at 04:27
  • 1
    this parameter shows how many threads will condense your heap, how many can move objects around and how many can scan the heap. If you enable more logs, you might see how much time it takes to move objects under this Full GC and what in general is eating your time, `20 s` looks a bit too much for `32 GB` heap. It might not even be fragmentation here, but may be bringing to safepoints or Cleanup phases eating all of this time... – Eugene Mar 08 '20 at 04:33

0 Answers0