Multiprocessing subprocesses randomly receive SIGTERMs

Question

I'm fiddling with multiprocessing and signal. I'm creating a pool, and have the workers catch SIGTERMs. With no apparent reasons, I observe that the subprocesses randomly receive SIGTERMs. Here is a MWE:

import multiprocessing as mp
import signal
import os
import time

def start_process():
    print("Starting process #{}".format(os.getpid()))

def sigterm_handler(signo, _frame):
    print("Process #{} received a SIGTERM".format(os.getpid()))

def worker(i):
    time.sleep(1)

signal.signal(signal.SIGTERM, sigterm_handler)
while True:
    with mp.Pool(initializer=start_process) as pool:
        pool.map(worker, range(10))
    time.sleep(2)

Output:

Starting process #7735
Starting process #7736
Starting process #7737
Starting process #7738
Starting process #7739
Starting process #7740
Starting process #7741
Starting process #7742
Job done.
Starting process #7746
Starting process #7747
Starting process #7748
Starting process #7749
Starting process #7750
Starting process #7751
Starting process #7752
Starting process #7753
Process #7748 received a SIGTERM
Process #7746 received a SIGTERM
Job done.
Starting process #7757
Starting process #7758
Starting process #7759
Starting process #7760
Starting process #7761
Starting process #7762
Starting process #7763
Starting process #7764

As you can see, that looks unpredictable.

So, where do these SIGTERMs come from? Is this normal? Am I guaranteed that the workers will finish their job? And in the end, is it OK to have the subprocesses capture SIGTERMs?

score 2 · Accepted Answer · answered Dec 03 '18 at 21:10

It's normal and can happen while your pool is executing __exit__ upon leaving the context-manager. Since the workers have finished their jobs at that point, there's nothing to worry about. The pool itself causes the SIGTERM for workers which don't have an exitcode available when the pool checks for it. This gets triggered in the Pool._terminate_pool-method (Python 3.7.1):

    # Terminate workers which haven't already finished.
    if pool and hasattr(pool[0], 'terminate'):
        util.debug('terminating workers')
        for p in pool:
            if p.exitcode is None:
                p.terminate()

The pool-workers will get joined a few lines later:

    if pool and hasattr(pool[0], 'terminate'):
        util.debug('joining pool workers')
        for p in pool:
            if p.is_alive():
                # worker has not yet exited
                util.debug('cleaning up worker %d' % p.pid)
                p.join()

In a scenario where you would call pool.terminate() explicitly while your workers are still running (for example you are using pool.map_async and then use pool.terminate()), your application would deadlock waiting on the p.join() (unless you let your sigterm_handler eventually call sys.exit()).

Better don't mess with signal handlers if you don't have to.

Thanks for the insights. I'm not done yet, but it was definitely the answer I was expecting. — Right leg, Dec 04 '18 at 08:46

score 0 · Answer 2 · answered Dec 03 '18 at 18:59

I think it normal, but can't say anything about the random message printing. You can get more info, insert this in the main:

mp.log_to_stderr(logging.DEBUG)

and change the start_process():

def start_process():
    proc= mp.current_process()
    print("Starting process #{}, its name is {}".format(os.getpid(),proc.name))

Multiprocessing subprocesses randomly receive SIGTERMs

2 Answers2