How to detect a hung linux service?

Question

I have noticed on some of my linux servers that a linux service will be hung. The only way I know that it is hung is operations that rely on the service fails and when I restart the service it fails to stop but it starts fine.

If I do service <servicename> status it says its running, If I do a ps -ef | grep <servicename> it only shows one process running for that service which is correct.

Anything else I can check to know if it is hung or not? I am trying to be proactive about bringing these service(s) back up and also determining why they are getting hung.

For reference the services are mostly openstack-nova-compute and openstack-cinder-volume. The cinder volume service I can detect with the rabbitMQ starting to build up but the same thing doesn't happen for nova-compute.

This is very hard to test because like I said the only way I know is if I try to do something on that node in OpenStack and it fails or gets hung, and then I restart the service.

Questions on professional server- or networking-related infrastructure administration are off-topic for Stack Overflow unless they directly involve programming or programming tools. You may be able to get help on Server Fault. 1 — Marcus Müller, Sep 18 '15 at 14:49
Sorry mistake I didnt realize this site was for programming related questions only — huan0602, Oct 30 '15 at 12:55
Huan, might be a good choice to ask there, and delete here. Your choice, though! Have a nice day! — Marcus Müller, Oct 30 '15 at 13:29

score -1 · Answer 1 · answered Sep 18 '15 at 14:49

-1

You could use some tool (a script or even a "real" monitoring tool like Nagios) to do exactly what you said - mimick those "operations that rely on the service" - which means trying to contact the regarding service, and on fail, will send some kind of notification! (Or even restart it automatically.)

answered Sep 18 '15 at 14:49

cslotty

1,696
20
28

Why has my answer been downvoted? Doing what I described is indeed regular practise in real world monitoring of productive systems! And yes, there might even be programming involved, because those Nagios scripts, for instance, need to be programmed (even in case it's "only" Bash scripting). – cslotty Nov 09 '15 at 13:23
And my answer is different than Anony-Mousse's, because he's talking about sending "example requests", related to HTTP, which is only a very limited part of what any "linux service" could do. That's why I said that, whatever that service does, needs to be requested ("mimicked") the right way. And that request would obviously not be limited to a TCP protocol! – cslotty Nov 09 '15 at 13:31

How to detect a hung linux service?

1 Answers1