1

I have noticed on some of my linux servers that a linux service will be hung. The only way I know that it is hung is operations that rely on the service fails and when I restart the service it fails to stop but it starts fine.

If I do service <servicename> status it says its running, If I do a ps -ef | grep <servicename> it only shows one process running for that service which is correct.

Anything else I can check to know if it is hung or not? I am trying to be proactive about bringing these service(s) back up and also determining why they are getting hung.

For reference the services are mostly openstack-nova-compute and openstack-cinder-volume. The cinder volume service I can detect with the rabbitMQ starting to build up but the same thing doesn't happen for nova-compute.

This is very hard to test because like I said the only way I know is if I try to do something on that node in OpenStack and it fails or gets hung, and then I restart the service. I have a script running to test some OpenStack services but with nova scheduler it might take a while for it to put a instance on that host, or the host may be full so it will never put another instance on that host.

huan0602
  • 11
  • 1

2 Answers2

1

Use monitoring solutions like Zabbix or Nagios, write scripts/checks for services, including monitoring process existence, process cpu usage, process memory usage, API responses, etc

GioMac
  • 4,544
  • 4
  • 27
  • 41
0

You can write a script (a cron job?) that checks the time-stamp of logs of the OpenStack services that you want to monitor. I think most of the services perform some kind of auditing and log it. Also any operation should generate logs. That way, if the logs don't update after a while, you can try restarting the service.

And as you mentioned, determining why they are getting hung should be critical.

Kaustubh
  • 26
  • 2