Alternative ways to detect hung OpenStack/Linux services?

Question

I have noticed on some of my linux servers that a linux service will be hung. The only way I know that it is hung is operations that rely on the service fails and when I restart the service it fails to stop but it starts fine.

If I do service <servicename> status it says its running, If I do a ps -ef | grep <servicename> it only shows one process running for that service which is correct.

Anything else I can check to know if it is hung or not? I am trying to be proactive about bringing these service(s) back up and also determining why they are getting hung.

For reference the services are mostly openstack-nova-compute and openstack-cinder-volume. The cinder volume service I can detect with the rabbitMQ starting to build up but the same thing doesn't happen for nova-compute.

This is very hard to test because like I said the only way I know is if I try to do something on that node in OpenStack and it fails or gets hung, and then I restart the service. I have a script running to test some OpenStack services but with nova scheduler it might take a while for it to put a instance on that host, or the host may be full so it will never put another instance on that host.

score 1 · Answer 1 · answered Feb 28 '16 at 06:35

1

Use monitoring solutions like Zabbix or Nagios, write scripts/checks for services, including monitoring process existence, process cpu usage, process memory usage, API responses, etc

answered Feb 28 '16 at 06:35

GioMac

4,544
4
27
41

score 0 · Answer 2 · answered Feb 28 '16 at 06:19

You can write a script (a cron job?) that checks the time-stamp of logs of the OpenStack services that you want to monitor. I think most of the services perform some kind of auditing and log it. Also any operation should generate logs. That way, if the logs don't update after a while, you can try restarting the service.

And as you mentioned, determining why they are getting hung should be critical.

Alternative ways to detect hung OpenStack/Linux services?

2 Answers2