0

Disclaimer: I am not a Java expert. My knowledge of Java is limited to System.out.println to be honest.

We have been noticing performance issues (application doesn't respond for couple of minutes every hour) with one of our systems and decided to monitor the JVM running the application (please refer the attached screenshot of VisualVM monitoring tab).

VisualVM screen shot for the JVM

What we have observed is that the JVM just freezes up for ~2 minutes every hour and then the used heap size drops and thread count shoots up. The used heap size otherwise maintains itself within the 6G-12G range, but once every hour this pattern is broken.

Is there any possible explanation for this behaviour? It is periodic in nature (like I said, every 1 hour). Checked the task manager during this time and could not see any other processes kick in.

Please also find the JVM arguments below.

-Xmx20480M
-XX:PermSize=128M
-XX:MaxPermSize=256M
-XX:+UseConcMarkSweepGC
-XX:+CMSClassUnloadingEnabled
-Xms20480M
-XX:NewSize=10240M
-XX:SurvivorRatio=4
-XX:+UseCompressedOops
-XX:CMSInitiatingOccupancyFraction=60
-XX:+UseStringCache
-XX:+UseFastAccessorMethods
-XX:ErrorFile=c:/hs_err.log
-XX:HeapDumpPath=c:/
-XX:+HeapDumpOnOutOfMemoryError
-Xrs

The application runs on a server with 32G RAM and 8 core processor, if that helps.

Jay
  • 1,980
  • 1
  • 13
  • 23
  • Not sufficient info. Could be for so many reasons. Even an anti-virus software that is scheduled to run every hour can interrupt other processes. You can debug and check what's going on and what is taking huge amount of resources every hour. 6G-12G range is something to start with perhaps. – Sajal Dutta Sep 24 '15 at 01:43
  • We have checked and there are no scheduled tasks which run on the machine during that time (including anti-virus). There is one more service which runs on the same machine and doesn't get impacted this way. What other information can help? I will try to get that data. – Jay Sep 24 '15 at 01:58
  • 1
    Try again with less memory (lower values for -Xmx, -Xms, etc.) More memory doesn't necessarily mean better performance. I think your issue is the garbage collector, it tries to liberate a lot of memory and that's why you are getting a slow down in your application, and after that you see more available memory. – morgano Sep 24 '15 at 03:29
  • It's likely the JVM garbage collection that causes the issue. How is swap space on the machine during this JVM freeze period? Are there other processes taking up a decent amount of memory? I've had similar experiences; the garbage collector has a lot of overhead. When it had to use swap, it stalled. – HulkSmash Sep 24 '15 at 03:32
  • @morgano, We were running on -Xms at 4G and -Xmx at 12G considering that the heap size after collection is 6G. But the same issue had caused one of our consultants to suggest an increase in the -Xmx. Also, if you look at the heap graph, there are instances when GC is clearing more objects (12G to 6G heap size reduction) without JVM freezing up. But, yes I agree that this is surely a GC issue. After your comment, I performed a GC from VisualVM and voila the JVM froze. Now, what would be the difference between the GCs that works fine and the ones which don't? – Jay Sep 24 '15 at 04:13
  • @DV88, Yes, we have confirmed that the issue is with GC. Not sure which GC though as there are ones which run fine even though there are more objects to collect/clear. The system is dedicated to run only two processes; one which has the issue and another light weight process which is basically a message listener (hardly takes upto than 1G of RAM). Then there are the usual windows server related processes, but nothing of this scale. I will check the swap space and report back in case there is something odd there. Thanks. – Jay Sep 24 '15 at 04:17
  • Try again activating the Incremental GC, (use __-Xincgc__) this is for the GC not to collect everything in one go, but it in small increments. – morgano Sep 24 '15 at 04:28
  • ... and let us know your findings :-) – morgano Sep 24 '15 at 04:29
  • Since this is a prod machine, we cannot experiment with the settings. But I replicated the issue in one of our dev machines and has found that the full GC is caused by RMI (the application exposes services as RMI). The full GC issues with RMI is well documented (eg: https://plumbr.eu/blog/garbage-collection/rmi-enforcing-full-gc-to-run-hourly). Will try and set -XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses and see how it goes. Will update the answer here if it resolves the performance issue. Thanks for all the comments. Really helped me look in the right direction. – Jay Sep 24 '15 at 08:19
  • If you are using RMI (eg. running JMX with the out-of-the-box RMI connector) it will force full GC once an hour. And full GC on 12G can take a while ... It's weird that this doesn't show up on the GC activity graph, though. – JB- Sep 24 '15 at 13:31

0 Answers0