0

We've some servers in linux and those servers get hang(stuck) but not stopped. So, how can I deal with those servers. It's not clear what's the cause of this stuckness. Any guidance will be appreciated.

The problems:

  1. The server hangs time to time. It doesn't get stopped. It just hangs. Theoritically it's still up but practically it has stopped working. The one way to trace it is to monitor the logs, you'd see logs not being printed anymore.

Cause: Unknown

  1. The server goes down time to time, too frequently on some servers.

Cause: Huge log size

Solution: logrotate

  1. The server goes down time to time, too frequently on some servers.

Cause: Unknown

Solution: Script that auto-restarts the service in timely manner. I've less hopes that it will work though.

  1. The clients want to be able to monitor these services by themselves and do things like restarting by themselves. What's the best monitoring tool that allows to restart the service as well(i.e something that runs scripts as I like)?

Are nagios, zabbix, monit used for this purpose? what's the best tool for this purpose?

We're using centos 7 (Yes it's reaching end of life). The servers are on virtual machine. We only have remote access. The applications are:

  • java servers

  • glassfish servers

  • tomcat servers

achhainsan
  • 123
  • 7
  • 1
    Provide more information like hardware, OS, applications, do you have physical access, etc – Romeo Ninov Aug 28 '23 at 08:04
  • In general you have your monitoring, logs and can check the console (for things like OoM killer events). Enterprise hardware usually comes with out-of-band management console that will also give insight into server health and events. You try and perform a root-cause analyses and based on that you decide on a solution. [STONITH](https://en.wikipedia.org/wiki/STONITH) is the common clustering solution to deal with hanging servers. – HBruijn Aug 28 '23 at 09:22
  • In addition to the requested edits for any information at all, add what your organization's uptime objective is, and what high availability design is in place. Sometimes you can put multiple of the same host behind a load balancer or in a cluster. Sometimes you do not have the time or the budget and its just one server. – John Mahowald Aug 28 '23 at 23:05
  • Requests for products are off topic. – Greg Askew Aug 29 '23 at 06:18

0 Answers0