0

I have a multi-process application that works like so...

There is a parent process. The parent process queries a database to find work, then forks children to process that work. The children communicate back to the parent via System V message queues to indicate they're done with their work. When the parent process picks up that message, it updates the database to indicate that the work is complete.

This works okay but I'm struggling with handling the parent process being killed.

What happens is the parent receives a SIGINT(from CTRL-C), and then sends SIGKILLs to each of the children. If a child is currently blocking on a Sys V message queue write when it receives that signal, the write is "interrupted" by the signal and the blocking canceled and the parent never learns that the child's work was done, and the database never gets updated.

That means that the next time I run the script, it will re-run any work that was blocking on the System V queue write.

I don't have a good idea for a solution for this yet. Ideally I would like to be able to force the queue write to remain blocking even when it receives that SIGKILL but I don't think such a thing is possible.

ashgromnies
  • 3,266
  • 4
  • 27
  • 43

1 Answers1

0

Well SIGKILL is, by definition, immediately fatal to the process which receives it and cannot be trapped or handled.

That is why you should only use it as a last resort, when the process does not respond to more polite requests to shut down. Your parent process should start off by sending something like SIGINT or SIGTERM to the children, and only reset to SIGKILL if they don't exit within a reasonable period of time.

Signals like SIGINT and SIGTERM may still cause the system call in the child to return, with EINTR, but you can handle that and retry the call and let it complete before exiting.

TomH
  • 8,900
  • 2
  • 32
  • 30
  • I should have made this clear -- the code that does the "SIGKILL" is in a shared library I am wary of changing. I thought about changing the signal used, but I really can't. It almost seems like prior to the parent thread sending out the SIGKILLS to the children, it should send a different signal(SIGHUP?) to indicate that it's about to shut them down, then wait for the children to reply that they're done shutting down, and then send the SIGKILL. – ashgromnies Jan 22 '13 at 23:03