-1

I have process which forks a lot. Child processes do lot of stuff and another system calls.
When ANY child process gets error from system call, it prints error description to stderr and send SIGUSR1 to group leader (main parent process).
SIGUSR1 tells parent to kill all child processes, free resources and exit program execution (to avoid zombie processes).

I need to kill all children at once. Atomically. So when any error happens in ANY child process, all child processes stops with their work immediately.
Currently parent process kills all child processes with SIGUSR2 - It sends this signal to all process group members (killpg) - all of them have signal handler installed which kills them (exit) - group leader won't get killed though (it still needs to free resources).

The problem is that before all child processes get killed, they still can execute about 1-2 rows of code, which is not what I want. I need to stop them immediately.

How can I achieve this?

Mára Toner
  • 302
  • 2
  • 16
  • How do you know they can execute 1-2 lines of code before they get killed? – user253751 Apr 18 '16 at 23:21
  • Because I see it. They do some system calls and outputs some thing, and some of them do some output before they get killed. I just know, trust me :) – Mára Toner Apr 18 '16 at 23:55
  • But how do you know they're executing 1-2 lines in between getting the signal and dying, and not executing 1-2 lines before the signal even gets sent? – user253751 Apr 19 '16 at 00:21
  • Because when signal is sent, it kills child processes immediately – Mára Toner Apr 19 '16 at 00:37
  • If the child processes are killed immediately then why does the question say the child processes are not killed immediately? – user253751 Apr 19 '16 at 00:39
  • In a world with finite signal-propagation speed ( – EOF Apr 19 '16 at 00:41
  • 1-2 lines of C code?? How long does it take for a core to execute two lines? How long does it take to send a 'stop this thread' message through an inter-core driver when stopping processes/threads? What you seem to be asking for is unreasonable. – Martin James Apr 19 '16 at 00:45
  • Indeed, what happens if the leader's called to `killpg` gets delayed - say, to load the function `killpg` from disk because this is the first time it's been used today? – user253751 Apr 19 '16 at 00:53

2 Answers2

2

Signals are delivered in a async fashion, since both parent and child processes are running, you cannot expect the child process will handle the signal immediately when parent send the signal.

The problem is that before all child processes get killed, they still can execute about 1-2 rows of code, which is not what I want. I need to stop them immediately.

Your problem is more of a coordination and synchronization between processes, rather than signal handles. There are two ways I can think of:

  1. Use synchronized signals. That is when each child send SIGUSR1 to the parent, they stop working, and wait on SIGUSR2 signal by the waiting functions, like sigtimedwait, or sigwait, in this way, they will not run any additional code before exiting.

  2. Use pipe or socketpair to create communication channels between parent and children, that is, parent send kill instruction to children, and each child will free necessary resources and kill themselves. This requires children to listen on the channel while doing work.

fluter
  • 13,238
  • 8
  • 62
  • 100
1

Do you mean that all child processes must stop working as soon as the faulty child send SIGUSR1 ?

If this is what you want, I don't think you can achieve this the way you are doing: when the faulty child sends SIGUSR1 to the leader, the other childs will continue execution until the SIGUSR1 is processed by the leader.

Do you really need the faulty process to send SIGUSR1 first to the leader ? Would not this be possible that the faulty process directly sends SIGUSR2 to the group, which signal can just be ignored by the leader (or, at least, not processed as a termination signal) ?

shrike
  • 4,449
  • 2
  • 22
  • 38