2

I have a java web application that is beign developed in Java 8 and that is beign deployed on two tomcat 8.5.33 clustered servers running on a Oracle Linux Server 7.5. The problem, is as follows.

The war has been deployed continuously for the last couple of weeks without any problems, the thing is that suddenly it started working very slowly.

After investigating I came up with some causes and solutions to them, but, none of them seemed to be the cause of my problem.

At first I thought it could have been a memory leak or something like that, but after seeing that that wasn´t the case, rebooting the system just in case and giving tomcat more memory to use, nothing worked. I also found that there was a possibility of the problem being caused by a too large catalina.out file, that also wasn´t the case.

When looking in the logs produced by tomcat it seems like nothing is going wrong besides:

05-Dec-2018 13:51:28.412 SEVERE [main] org.apache.catalina.ha.deploy.FarmWarDeployer.start FarmWarDeployer can only work as host cluster subelement!

This seems to be a cluster error but as I have investigated this shouldn´t be the cause of my problem, besides that logs it is continuously logging:

05-Dec-2018 15:09:16.832 FINE [ContainerBackgroundProcessor[StandardEngine[Catalina]]] org.apache.catalina.session.ManagerBase.processExpires Start expire sessions StandardManager at 1544018956832 sessioncount 1

05-Dec-2018 15:09:16.833 FINE [ContainerBackgroundProcessor[StandardEngine[Catalina]]] org.apache.catalina.session.ManagerBase.processExpires End expire sessions StandardManager processingTime 1 expired sessions: 0

The weird thing is that in my webapp logs you can see where too much time is passing between logs. This occurs randomly every time a servlet is called, doing everything fine until at a random log (random because sometimes it starts at one log, and another time at another one) the logs start being written every 10 seconds.

Log4j2 2.11 is being used for the logs and here is an example of some logs:

05/12/2018 11:53:18 INFO

05/12/2018 11:53:18 INFO

05/12/2018 11:53:18 INFO

05/12/2018 11:53:18 INFO

05/12/2018 11:53:38 INFO

05/12/2018 11:53:48 INFO

05/12/2018 11:53:58 INFO

05/12/2018 11:54:08 INFO

05/12/2018 11:54:18 INFO

05/12/2018 11:54:28 INFO

05/12/2018 11:54:38 INFO

05/12/2018 11:54:48 INFO

05/12/2018 11:54:58 INFO

05/12/2018 11:55:08 INFO

  • Hi! All this may be due to very different reasons. Have you talked to your system administrator in order to recover system-level information? I am thinking of hardware parameters and I/O statistics. Is your machine swapping now? Has it the same hardware assignment? Maybe some system administrator removed RAM or CPUs from your vms? – Jorge_B Dec 05 '18 at 15:10
  • On which filesystem(s) are data and the logs stored on? – Emmanuel Rosa Dec 05 '18 at 15:11
  • @Jorge_B Yes apparently it seems like Java is consuming a lot of CPU when deploying the generated wars, besides that, it behaves strangely when using system resources. Still haven´t found the cause of the problem, but it seems like its fault of the vm, either changed hardware or changed configurations. Recomendatios are welcome, and thank you for your response! – TheTheodorus Dec 05 '18 at 17:10
  • @EmmanuelRosa I don´t know to what kind of data you are referring to, if its the database your talking about its on a separate VM on the same physical machine. The logs are being saved in a folder inside the tomcat logs folder with log4j2. Thank you for your response! – TheTheodorus Dec 05 '18 at 17:13
  • It would be very important to be able to compare a system-wide snapshot of your system now with as much information as you can retrieve of its situation before the incident. We should be able too to properly define what you mention as 'behaving strangely when using system resources'. Once and if we discard system configuration changes as a cause, you should have a look at every code change pushed recently to the SCM in order to discard too any possible programming bad practice. Please feel free to update the question with as much info as possible – Jorge_B Dec 06 '18 at 08:20
  • I asked about the filesystem because if your database is on a COW-based filesystem, such as BTRFS, there's a good chance the database files have become heavily fragmented. The same goes for your systemd log files. – Emmanuel Rosa Dec 06 '18 at 09:56

1 Answers1

1

Had the same problem in our system. In our case it was caused by the Linux user that used to launch the Tomcat, it didn't have enough privileges needed for all the Tomcat Apps, so I simply solved it by launching it with the sudo command.

sudo $CATALINA_HOME/bin/startup.sh

I don't have information about why this solved the problem, and what was causing all the slowness, cause of time needs (just wanted to resolve it as quickly as we could). Hope this suits for you.