4

In the last few weeks I have been testing different JVM setting for my Glassfish server. The main settings for heap (among others) are: -Xms512m, -Xmx512m, -XX:NewRatio=2. I tried different GC setting but I still have problems with long pauses after few days from starting server. I noticed following:
1. -XX:+UseParallelGC -XX:+UseParallelOldGC - minor GC were occuring about every minute, major GC every 18 hours. I had no problems with minor GC but there were problems with major GC after 5 days. At first major GC pauses lasted about 100-200ms but at the end the last pause lasted 70 seconds.
2. -XX:+UseConcMarkSweepGC -XX:+UseParNewGC - almost same as the above. Minor GC were fine, but major GC (not full) pauses were getting very long. I noticed problems with high class unloading at GC (CMS Final Remark) phase which is stop the world phase.
3. -XX:+UseConcMarkSweepGC -XX:+UseParNewGC and -XX:MaxGCPauseMillis=5000. I was testing this for only a day because the second major GC last (not full) already lasted for about 20 seconds so I thought there is something else wrong.
4. -XX:+UseG1GC, -XX:MaxGCPauseMillis=5000, -XX:+UseStringDeduplication without -XX:NewRatio=2 option - the major GC (not full) is occurring every 12 hours, I have noticed some problems already:

2015-05-31T18:25:25.145+0200: 83383.897: [GC concurrent-mark-start]
2015-05-31T18:25:35.563+0200: 83394.312: [GC concurrent-mark-end, 10.4145795 secs]
2015-05-31T18:25:35.563+0200: 83394.312: [GC remark 83394.312: [Finalize Marking, 0.0002939 secs] 83394.312: [GC ref-proc, 1.2128584 secs] 83395.525: [Unloading, 14.5180500 secs], 15.7320854 secs]
 [Times: user=0.14 sys=0.22, real=15.73 secs] 

GC remark phase took 15 seconds which is not acceptable for me. You can see that unloading took most of the time. That also happened before when using other GC so I think there must be problem with class unloading.

Summary: all GC run OK for some time but after few days problems started to appear with long pause times. I don't know why it is working fine first few days and then suddenly the results are very bad. I noticed that higher pause time was caused by class unloading so I am wondering if there are some setting to get better results. Also I want to know which GC are you recommending to me to use? I have internal web application running on glassfish server on PC with 8gb of RAM, i7 processor and with Windows 8 operating system. There will be max 10 clients connection at same time but it must have long uptime and there can not be long pause times (max 5 seconds). Please advise me what else I can do to get shorter pause times.

And one more question: what could be disadvantage using G1GC instead of CMS or ParallelGC in my case? Is the heap to small to use G1GC?

EDIT: G1GC log right before and after long pause GC remark phase

2015-05-31T18:25:25.004+0200: 83383.755: [GC pause (G1 Evacuation Pause) (young) (initial-mark), 0.1280453 secs]
   [Parallel Time: 116.2 ms, GC Workers: 4]
      [GC Worker Start (ms): Min: 83383757.6, Avg: 83383757.7, Max: 83383757.7, Diff: 0.0]
      [Ext Root Scanning (ms): Min: 97.8, Avg: 98.3, Max: 98.5, Diff: 0.7, Sum: 393.1]
      [Update RS (ms): Min: 0.2, Avg: 4.0, Max: 14.8, Diff: 14.6, Sum: 16.1]
         [Processed Buffers: Min: 1, Avg: 6.0, Max: 16, Diff: 15, Sum: 24]
      [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Object Copy (ms): Min: 0.2, Avg: 2.5, Max: 3.7, Diff: 3.5, Sum: 10.2]
      [Termination (ms): Min: 0.0, Avg: 8.5, Max: 11.4, Diff: 11.4, Sum: 34.2]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [GC Worker Total (ms): Min: 113.4, Avg: 113.4, Max: 113.5, Diff: 0.0, Sum: 453.8]
      [GC Worker End (ms): Min: 83383871.1, Avg: 83383871.1, Max: 83383871.1, Diff: 0.0]
   [Code Root Fixup: 0.1 ms]
   [Code Root Purge: 0.0 ms]
   [String Dedup Fixup: 2.2 ms, GC Workers: 4]
      [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Table Fixup (ms): Min: 2.0, Avg: 2.1, Max: 2.1, Diff: 0.1, Sum: 8.3]
   [Clear CT: 0.1 ms]
   [Other: 9.5 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 8.8 ms]
      [Ref Enq: 0.1 ms]
      [Redirty Cards: 0.3 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.1 ms]
   [Eden: 215.0M(215.0M)->0.0B(215.0M) Survivors: 7168.0K->7168.0K Heap: 451.5M(512.0M)->236.6M(512.0M)]
 [Times: user=0.08 sys=0.02, real=0.13 secs] 
2015-05-31T18:25:25.129+0200: 83383.883: [GC concurrent-root-region-scan-start]
2015-05-31T18:25:25.129+0200: 83383.883: [GC concurrent-string-deduplication, 160.0B->0.0B(160.0B), avg 48.3%, 0.0000070 secs]
   [Last Exec: 0.0000070 secs, Idle: 23.1834927 secs, Blocked: 0/0.0000000 secs]
      [Inspected:               3]
         [Skipped:              0(  0.0%)]
         [Hashed:               3(100.0%)]
         [Known:                0(  0.0%)]
         [New:                  3(100.0%)    160.0B]
      [Deduplicated:            3(100.0%)    160.0B(100.0%)]
         [Young:                3(100.0%)    160.0B(100.0%)]
         [Old:                  0(  0.0%)      0.0B(  0.0%)]
   [Total Exec: 2868/0.1946124 secs, Idle: 2868/83382.9701762 secs, Blocked: 13/0.0032760 secs]
      [Inspected:          304493]
         [Skipped:              0(  0.0%)]
         [Hashed:          163708( 53.8%)]
         [Known:            44808( 14.7%)]
         [New:             259685( 85.3%)     21.9M]
      [Deduplicated:       160467( 61.8%)     10.6M( 48.3%)]
         [Young:            83546( 52.1%)   6270.6K( 57.8%)]
         [Old:              76921( 47.9%)   4571.3K( 42.2%)]
   [Table]
      [Memory Usage: 4291.8K]
      [Size: 131072, Min: 1024, Max: 16777216]
      [Entries: 133319, Load: 101.7%, Cached: 6107, Added: 142389, Removed: 9070]
      [Resize Count: 7, Shrink Threshold: 87381(66.7%), Grow Threshold: 262144(200.0%)]
      [Rehash Count: 0, Rehash Threshold: 120, Hash Seed: 0x0]
      [Age Threshold: 3]
   [Queue]
      [Dropped: 0]
2015-05-31T18:25:25.145+0200: 83383.897: [GC concurrent-root-region-scan-end, 0.0140467 secs]
2015-05-31T18:25:25.145+0200: 83383.897: [GC concurrent-mark-start]
2015-05-31T18:25:35.563+0200: 83394.312: [GC concurrent-mark-end, 10.4145795 secs]
2015-05-31T18:25:35.563+0200: 83394.312: [GC remark 83394.312: [Finalize Marking, 0.0002939 secs] 83394.312: [GC ref-proc, 1.2128584 secs] 83395.525: [Unloading, 14.5180500 secs], 15.7320854 secs]
 [Times: user=0.14 sys=0.22, real=15.73 secs] 
2015-05-31T18:25:51.288+0200: 83410.045: [GC cleanup 334M->326M(512M), 0.2836092 secs]
 [Times: user=0.00 sys=0.00, real=0.28 secs] 
2015-05-31T18:25:51.570+0200: 83410.328: [GC concurrent-cleanup-start]
2015-05-31T18:25:51.570+0200: 83410.328: [GC concurrent-cleanup-end, 0.0000669 secs]
2015-05-31T18:26:03.732+0200: 83422.482: [GC pause (G1 Evacuation Pause) (young), 0.1031257 secs]
   [Parallel Time: 91.6 ms, GC Workers: 4]
      [GC Worker Start (ms): Min: 83422481.7, Avg: 83422481.7, Max: 83422481.8, Diff: 0.0]
      [Ext Root Scanning (ms): Min: 1.3, Avg: 1.7, Max: 2.7, Diff: 1.4, Sum: 6.9]
      [Update RS (ms): Min: 0.0, Avg: 22.7, Max: 89.8, Diff: 89.8, Sum: 90.8]
         [Processed Buffers: Min: 0, Avg: 7.3, Max: 15, Diff: 15, Sum: 29]
      [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.2]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Object Copy (ms): Min: 0.5, Avg: 2.4, Max: 3.4, Diff: 2.9, Sum: 9.5]
      [Termination (ms): Min: 0.0, Avg: 64.7, Max: 86.3, Diff: 86.3, Sum: 258.9]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [GC Worker Total (ms): Min: 91.6, Avg: 91.6, Max: 91.6, Diff: 0.0, Sum: 366.3]
      [GC Worker End (ms): Min: 83422573.3, Avg: 83422573.3, Max: 83422573.3, Diff: 0.0]
   [Code Root Fixup: 0.1 ms]
   [Code Root Purge: 0.0 ms]
   [String Dedup Fixup: 2.1 ms, GC Workers: 4]
      [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Table Fixup (ms): Min: 1.9, Avg: 1.9, Max: 1.9, Diff: 0.1, Sum: 7.7]
   [Clear CT: 0.1 ms]
   [Other: 9.3 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 8.8 ms]
      [Ref Enq: 0.1 ms]
      [Redirty Cards: 0.0 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.1 ms]
   [Eden: 215.0M(215.0M)->0.0B(19.0M) Survivors: 7168.0K->6144.0K Heap: 443.6M(512.0M)->228.2M(512.0M)]
 [Times: user=0.30 sys=0.01, real=0.10 secs] 
2015-05-31T18:26:03.848+0200: 83422.597: [GC concurrent-string-deduplication, 160.0B->0.0B(160.0B), avg 48.3%, 0.0123951 secs]
   [Last Exec: 0.0123951 secs, Idle: 38.7017788 secs, Blocked: 0/0.0000000 secs]
      [Inspected:               3]
         [Skipped:              0(  0.0%)]
         [Hashed:               3(100.0%)]
         [Known:                0(  0.0%)]
         [New:                  3(100.0%)    160.0B]
      [Deduplicated:            3(100.0%)    160.0B(100.0%)]
         [Young:                3(100.0%)    160.0B(100.0%)]
         [Old:                  0(  0.0%)      0.0B(  0.0%)]
   [Total Exec: 2869/0.2070075 secs, Idle: 2869/83421.6719550 secs, Blocked: 13/0.0032760 secs]
      [Inspected:          304496]
         [Skipped:              0(  0.0%)]
         [Hashed:          163711( 53.8%)]
         [Known:            44808( 14.7%)]
         [New:             259688( 85.3%)     21.9M]
      [Deduplicated:       160470( 61.8%)     10.6M( 48.3%)]
         [Young:            83549( 52.1%)   6270.8K( 57.8%)]
         [Old:              76921( 47.9%)   4571.3K( 42.2%)]
   [Table]
      [Memory Usage: 2565.5K]
      [Size: 65536, Min: 1024, Max: 16777216]
      [Entries: 81061, Load: 123.7%, Cached: 6553, Added: 142396, Removed: 61335]
      [Resize Count: 8, Shrink Threshold: 43690(66.7%), Grow Threshold: 131072(200.0%)]
      [Rehash Count: 0, Rehash Threshold: 120, Hash Seed: 0x0]
      [Age Threshold: 3]
   [Queue]
      [Dropped: 0]
2015-05-31T18:26:05.769+0200: 83424.518: [GC pause (G1 Evacuation Pause) (mixed), 0.2232916 secs]
   [Parallel Time: 216.7 ms, GC Workers: 4]
      [GC Worker Start (ms): Min: 83424518.3, Avg: 83424518.3, Max: 83424518.3, Diff: 0.0]
      [Ext Root Scanning (ms): Min: 1.2, Avg: 1.6, Max: 2.6, Diff: 1.4, Sum: 6.5]
      [Update RS (ms): Min: 0.0, Avg: 0.3, Max: 0.4, Diff: 0.4, Sum: 1.2]
         [Processed Buffers: Min: 0, Avg: 4.3, Max: 7, Diff: 7, Sum: 17]
      [Scan RS (ms): Min: 56.1, Avg: 102.3, Max: 144.4, Diff: 88.3, Sum: 409.2]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.3]
      [Object Copy (ms): Min: 50.4, Avg: 97.6, Max: 157.7, Diff: 107.2, Sum: 390.2]
      [Termination (ms): Min: 0.0, Avg: 14.8, Max: 19.8, Diff: 19.8, Sum: 59.1]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [GC Worker Total (ms): Min: 216.6, Avg: 216.6, Max: 216.6, Diff: 0.0, Sum: 866.5]
      [GC Worker End (ms): Min: 83424734.9, Avg: 83424734.9, Max: 83424734.9, Diff: 0.0]
   [Code Root Fixup: 0.1 ms]
   [Code Root Purge: 0.0 ms]
   [String Dedup Fixup: 1.5 ms, GC Workers: 4]
      [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Table Fixup (ms): Min: 1.4, Avg: 1.4, Max: 1.4, Diff: 0.0, Sum: 5.6]
   [Clear CT: 0.2 ms]
   [Other: 4.8 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 0.9 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.2 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.2 ms]
   [Eden: 19.0M(19.0M)->0.0B(21.0M) Survivors: 6144.0K->4096.0K Heap: 247.2M(512.0M)->175.2M(512.0M)]
 [Times: user=0.09 sys=0.00, real=0.22 secs] 
2015-05-31T18:26:05.992+0200: 83424.742: [GC concurrent-string-deduplication, 640.0B->152.0B(488.0B), avg 48.3%, 0.0000246 secs]
   [Last Exec: 0.0000246 secs, Idle: 2.1442834 secs, Blocked: 0/0.0000000 secs]
      [Inspected:               6]
         [Skipped:              0(  0.0%)]
         [Hashed:               5( 83.3%)]
         [Known:                0(  0.0%)]
         [New:                  6(100.0%)    640.0B]
      [Deduplicated:            5( 83.3%)    488.0B( 76.3%)]
         [Young:                5(100.0%)    488.0B(100.0%)]
         [Old:                  0(  0.0%)      0.0B(  0.0%)]
   [Total Exec: 2870/0.2070321 secs, Idle: 2870/83423.8162384 secs, Blocked: 13/0.0032760 secs]
      [Inspected:          304502]
         [Skipped:              0(  0.0%)]
         [Hashed:          163716( 53.8%)]
         [Known:            44808( 14.7%)]
         [New:             259694( 85.3%)     21.9M]
      [Deduplicated:       160475( 61.8%)     10.6M( 48.3%)]
         [Young:            83554( 52.1%)   6271.2K( 57.8%)]
         [Old:              76921( 47.9%)   4571.3K( 42.2%)]
   [Table]
      [Memory Usage: 2564.6K]
      [Size: 65536, Min: 1024, Max: 16777216]
      [Entries: 81026, Load: 123.6%, Cached: 6553, Added: 142397, Removed: 61371]
      [Resize Count: 8, Shrink Threshold: 43690(66.7%), Grow Threshold: 131072(200.0%)]
      [Rehash Count: 0, Rehash Threshold: 120, Hash Seed: 0x0]
      [Age Threshold: 3]
   [Queue]
      [Dropped: 0]
2015-05-31T18:26:08.157+0200: 83426.906: [GC pause (G1 Evacuation Pause) (mixed), 0.6216666 secs]
   [Parallel Time: 618.5 ms, GC Workers: 4]
      [GC Worker Start (ms): Min: 83426906.5, Avg: 83426906.5, Max: 83426906.5, Diff: 0.0]
      [Ext Root Scanning (ms): Min: 0.3, Avg: 8.0, Max: 15.7, Diff: 15.3, Sum: 31.9]
      [Update RS (ms): Min: 0.0, Avg: 4.5, Max: 8.5, Diff: 8.5, Sum: 17.9]
         [Processed Buffers: Min: 0, Avg: 7.0, Max: 18, Diff: 18, Sum: 28]
      [Scan RS (ms): Min: 13.4, Avg: 28.4, Max: 65.2, Diff: 51.8, Sum: 113.7]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.2]
      [Object Copy (ms): Min: 532.6, Avg: 577.3, Max: 604.5, Diff: 71.9, Sum: 2309.1]
      [Termination (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff: 0.3, Sum: 0.7]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [GC Worker Total (ms): Min: 618.4, Avg: 618.4, Max: 618.4, Diff: 0.0, Sum: 2473.6]
      [GC Worker End (ms): Min: 83427524.9, Avg: 83427524.9, Max: 83427524.9, Diff: 0.0]
   [Code Root Fixup: 0.1 ms]
   [Code Root Purge: 0.0 ms]
   [String Dedup Fixup: 1.3 ms, GC Workers: 4]
      [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Table Fixup (ms): Min: 1.2, Avg: 1.2, Max: 1.3, Diff: 0.1, Sum: 4.9]
   [Clear CT: 0.1 ms]
   [Other: 1.6 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 1.0 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.0 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.2 ms]
   [Eden: 21.0M(21.0M)->0.0B(21.0M) Survivors: 4096.0K->4096.0K Heap: 196.2M(512.0M)->129.4M(512.0M)]
 [Times: user=0.08 sys=0.02, real=0.62 secs] 

EDIT: Results after few hours:

enter image description here

enter image description here

There is a lot of pages/sec and pages input/sec and also page fault. Is this normal? Where can I set to monitor pages/sec and pages input/sec just for JVM (I only found pages faults)?

user4341206
  • 647
  • 1
  • 7
  • 26
  • You are setting max heap size (xmx) to initial size(xms) (512). Why not increase it? . i7 has enough cores. So CMS or g1gc should work fine (and be selected by default). – TheLostMind Jun 01 '15 at 06:18
  • I don't want to use much memory because there are others applications (not glassfish) running on this PC. I think 512m of heap size is enough for this application. If I increase heap size to 1024m major and minor GC would occur more rarely but pause times would be same or even higher or I'm not thinking right? – user4341206 Jun 01 '15 at 06:23
  • With increased heap, pause times will be higher but frequency of GC will be less. I think this is a safe bet.. You should try increasing the heap and check results – TheLostMind Jun 01 '15 at 06:27
  • Yes, but I have problems with pause times and not frequencies - I don't care if there were more GC if pause time would be shorter (< 5 seconds). – user4341206 Jun 01 '15 at 06:30
  • You should probably look at [this](http://blog.mgm-tp.com/2014/04/controlling-gc-pauses-with-g1-collector/) article and tweak your arguments :) – TheLostMind Jun 01 '15 at 06:39
  • Depending on your program you might also try reducing heap-size. smaller heap means more frequent major GCs, which might result in less paging and lower pause-times. – piet.t Jun 01 '15 at 06:57
  • By which amount the memory is freed by the `full GC`? – SubOptimal Jun 01 '15 at 07:27
  • I added an example log right before and after long pause remark phase happened. It was actually not full GC, but you can see, there was small clean up in this phase (GC cleanup 334M->326M(512M)), larger amount of memory was released in next GC pause (G1 Evacuation Pause) (mixed) phase – user4341206 Jun 01 '15 at 08:02
  • 1
    If you're running Java 8 have a look here [Long Class-Unloading Pauses with JDK8 ](https://blogs.oracle.com/poonam/entry/long_class_unloading_pauses_with) – SubOptimal Jun 01 '15 at 08:41
  • your stats seem to be of limited usefulness, as they are summary statistics and don't show peaks at the time of a GC relative to a background, which would make assessing it much easier. I can't tell whether GCs were CPU-starved for example. The maximum page file reads seem high, but the question is whether those coincide with a GC – the8472 Jun 02 '15 at 09:41
  • Did you eventually get the long class-unloading issue figured out and fixed? I would like to know the actual cause and the solution to it? thanks – Kin Cheung Mar 06 '17 at 10:14

2 Answers2

3

I'd guess you're barking up the wrong tree - I suspect Garbage Collection is not your problem...

You're only running a 512 MiB heap - to me, a long pause with a heap that size would be 1 or 2 seconds. Major pauses in the milliseconds are possible with huge (32 GiB) heaps.

I'm expecting that the issue is actually with your server - either the other applications you've mentioned are using enough memory to push your Java process (which will be around 50% bigger than the heap) into swap / virtual memory - or you're running your application in a virtualised environment (possibly with memory overcommitment / memory balooning issues).

As a really crude indicator, any GC algorithm should be able to churn though 100 MiB of heap per second - so if you're seeing worse than that, look elsewhere for the cause of the issue.

In this case, I believe GC is the symptom, not the problem.

Michael
  • 7,348
  • 10
  • 49
  • 86
  • Thank you for your answer. Can you please point me how can I check if Java process is pushed into swap / virtual memory or if I have memory balooning issues? How can you explain that first few days everything is ok and the suddenly GC pause times significantly increase (same applications beside glassfish are running all the time)? – user4341206 Jun 01 '15 at 08:31
  • See http://serverfault.com/questions/74822/how-do-i-tell-if-my-windows-server-is-swapping Is your server physical or virtual? – Michael Jun 01 '15 at 08:34
  • It's physical server. – user4341206 Jun 01 '15 at 08:35
  • Ok, then my advice is to use the Windows monitoring tools to build a picture up of what is going on on that server over time (memory usage, paging, scheduled processes etc.). Of your 8 GiB, 2 of that will probably be stolen by the OS so it only takes a few misbehaving programs to give you problems... – Michael Jun 01 '15 at 08:43
  • Ok, I'll check. Thank you. – user4341206 Jun 01 '15 at 09:11
  • Hello, I added some performance results after few hours... Can you please check if something is extremely wrong (I read that pages/sec and pages input/sec have very high values, normal are much lower)... – user4341206 Jun 02 '15 at 09:09
2

[Times: user=0.14 sys=0.22, real=15.73 secs]

This means that it spends a lot more wall-time GCing than actual CPU time. All GCs you have tried are multi-threaded and purely CPU-bound, meaning that they should generally burn more CPU time than wall time passes.

There are two possible causes that would change that

  • Resource starvation, either part of the heap gets swapped out or other processes are occupying the CPUs. As @Mikaveli suggests, use OS monitoring tools to get further information.
  • Something prevents the VM from reaching a safepoint for several seconds. You can check that with -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1

Additionally, since it happened during a remark pause using -XX:-ClassUnloadingWithConcurrentMark might fix it, but I guess it would just shift the problem to a regular GC.

And maybe it would be useful to trace how much it's actually trying to unload with -XX:+TraceClassUnloading. Application containers like glassfish might be doing some strange things that causes lots of classes to pile up.

Edit: For monitoring you'll mostly want to keep an eye on free physical memory (minus caches), CPU load, page in/out/faults. Ideally monitor paging at at the process level, since it does not concern the JVM whether another process is waiting for the disks.

As for CMS vs. G1: that's probably unrelated to your problem.

the8472
  • 40,999
  • 5
  • 70
  • 122
  • Ok, how about SubOptimal comment (blogs.oracle.com/poonam/entry/long_class_unloading_pauses_with) - is this also worth trying because I have exactly same problems - Unloading took 14.5180500 secs – user4341206 Jun 01 '15 at 10:13
  • you're not using large pages though. Afaik on windows you first would have to edit some registry policy to allow large page allocations at all. in the linked article the fundamental cause is memory paged out to disk (IO), as i've already explained. using large pages just worked in their particular setup because it pins them into memory along with the rest of the heap. – the8472 Jun 01 '15 at 10:20
  • Ok I'll monitor and include these setting in JVM (except -XX:-ClassUnloadingWithConcurrentMark which I'll include later if nothing shows up in monitoring). Thank you. – user4341206 Jun 01 '15 at 10:29
  • Hello, can you please explain me following: if I use -XX:-ClassUnloadingWithConcurrentMark so unloading won't happen during remark phase, when will it happen? I read somewhere that during full GC, but what if full GC is never activated? Is it possible that unloading never happens and I'll get some outofmemory exceptions or similar problems? I've been looking for the answer everywhere but I haven't found it yet. If you know more about ClassUnloadingWithConcurrentMark please help me. – user4341206 Jun 08 '15 at 10:40
  • 1
    It will perform a full GC before it runs out of metaspace for classes. – the8472 Jun 08 '15 at 15:37