1

I have a python script.py which launches a child process and does some work in a loop. This program most of the times correctly exits when sent a SIGUSR2 and in some rare cases I don't understand it does not. Here is the minimal working example

import os, subprocess,signal,sys,time

sub = None
shutDown = False

def handler(signum,frame):
    global shutDown, sub

    print("script.py: cached sig: %i " % signum)
    sys.stdout.flush()

    if shutDown:
      print("ignoring signal: %i " % signum)
      return
    else:
      shutDown = True

    if sub is not None and not sub.poll():
        print("script.py: sent signal to doStuff.sh pid: ", sub.pid)
        sys.stdout.flush()
        os.kill(sub.pid, signal.SIGTERM)
        os.waitpid(sub.pid,0)

    sys.exit(128+signum)

signal.signal(signal.SIGINT, handler)
signal.signal(signal.SIGUSR2, handler)
signal.signal(signal.SIGTERM, handler)

for i in range(0,5):
  print("launching %i" % i)
  sub = subprocess.Popen(["./doStuff.sh"], stderr = subprocess.STDOUT)
  sub.wait()

print("finished script.py")

with the doStuff stub

function signalHandler() {
    trap '' SIGINT 
    trap '' SIGTERM
    sleep 10s
    # kill ourself to signal calling process we exited on SIGINT
    trap - SIGINT
    kill -s SIGINT $$
}

trap_with_arg "signalHandler" SIGINT SIGTERM
trap '' SIGUSR2 

echo "doStuff.sh : pid:  $$"
for i in {1..100}; do
    sleep 1s
done

Now launch python script.py and send two times a SIGINT (kill -INT -thePID or press two times Ctrl+C with a second in between:

launching 0
doStuff.sh : pid:  7720
^Cscript.py: cached sig: 2 /** < first time CTRL + C
script.py: sent signal to doStuff.sh pid:  7720 /** < doStuff is waiting now
^Cscript.py: cached sig: 2  /** < second time CTRL + C
ignoring signal: 2          
launching 1                 /** < why continuing loop??? so python 
...

I am really wondering why the loop continues since there is an sys.exitat the end of the handler and I expected that the first call to shutDownHandlershould eventually call the sys.exit. Does somebody see what is flawed in this program? Having a really hard time to get this to work.

Gabriel
  • 8,990
  • 6
  • 57
  • 101
  • maybe you are still in the `os.waitpid()` because the child process has not yet cleaned up it's resources? – flaschbier Feb 21 '16 at 11:57
  • Ok lets assume first signal is still in the ``os.waitpid()`` which will get interupted by the second signal which just returns. So the stack should go back to the ``os.waitpid()``. but obviously it doesn't or something else fishy is happening, the code should probably be written differently by just setting flags in the handlers and handling the signals in ``main`` – Gabriel Feb 21 '16 at 12:08
  • what is the actual program being executed in the subprocess? – flaschbier Feb 21 '16 at 12:11
  • the actual program is Pixar Renderman `prman``, which terminates on SIGTERM, SIGINT – Gabriel Feb 21 '16 at 12:38
  • I am seeing the error happening on Centos 6 , on Ubuntu 15.04 I am trying to reproduce the error. – Gabriel Feb 21 '16 at 12:43
  • You say the childprocess will end on `SIGTERM` and `SIGINT`, but your printout sais 12 = `SIGUSR2`...? – flaschbier Feb 21 '16 at 12:45
  • Sorry I added a complete MWE, now I could reproduce the problem – Gabriel Feb 21 '16 at 12:54
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/104098/discussion-between-gabriel-and-flaschbier). – Gabriel Feb 21 '16 at 14:33
  • could you use a simpler code e.g., [use `child.py`](https://gist.github.com/zed/215a57b3681cc5f77d2a) instead of the dummy bash script, remove unnecessary `print()` calls, etc (if you can reproduce the issue without some line in the code, remove the line, to create [mcve]). – jfs Feb 22 '16 at 16:09
  • reduced to a simple example. thanks for the comment – Gabriel Feb 23 '16 at 21:47
  • [As we established in your previous question](http://stackoverflow.com/a/35515965/4279), you should not call `sub.wait()` inside the signal handler. It is pointless to call `sub.poll()` for the exact same reason (`.poll()` always returns `None` if the signal handler is called while the python process is blocked on `sub.wait()`). Also, your code example is still too complicated (there is a code that is unrelated to the issue and there could be even multiple issues). Is there a reason, you are refusing to use [the code I've suggested as the basis for your example](https://goo.gl/W6bWP9)? – jfs Feb 24 '16 at 20:55

0 Answers0