1

I'm trying to write a script that will mostly stay idle waiting for a predefined amount of time using regular system sleep. The problem emerges when I try to kill it externally (with start-stop-daemon for example). The main process will be killed, but the child sleep will remain in the system until it will run out. I decided to make a sanity trap and kill active sleep from the script itself. This is how it is done:

cleanup()
{
        local PIDS=$(jobs -p)
        echo $PIDS
        [ -n "$PIDS" ] && kill $PIDS
        exit 0
}
trap "cleanup" SIGINT SIGTERM

sleep 1h

When I hit Ctrl-C (send SIGINT) while the script is in foreground, the cleanup() procedure will fire up, but if I'm trying to kill (send default SIGTERM) the running script from other console nothing happens. No the cleanup() is executed, neither script terminates. The script will continue to run just like nothing happens. Can anybody explain what is going on and how to trap external SIGTERM and execute the desired procedure?

e-pirate
  • 153
  • 1
  • 10
  • 3
    The manual says `If Bash is waiting for a command to complete and receives a signal for which a trap has been set, the trap will not be executed until the command completes.`, so it seems to be expected behaviour. – Philippe Feb 15 '20 at 23:03

1 Answers1

3

As Philippe mentions in a comment above, the Bash Reference Manual § 3.7.6 "Signals" states in part:

If Bash is waiting for a command to complete and receives a signal for which a trap has been set, the trap will not be executed until the command completes.

This is actually true of SIGINT as well as SIGTERM; the reason that your approach seems to work for Ctrl-C is that Ctrl-C sends SIGINT to every process in the foreground process group, including the sleep process, so sleep exits immediately and then its parent script exits.

The most explicit way to fix this is offered by the rest of the paragraph I just quoted from:

When Bash is waiting for an asynchronous command via the wait builtin, the reception of a signal for which a trap has been set will cause the wait builtin to return immediately with an exit status greater than 128, immediately after which the trap is executed.

In other words, you can replace sleep 1h with sleep 1h & wait (where sleep 1h & and wait can be on separate lines or on the same line, as you prefer) to have your trap called immediately (so it can kill the sleep process).

Alternatively, you can eliminate the trap setup and replace sleep 1h with something that won't continue running so long after the script exits; for example:

  • a loop that repeatedly sleeps only briefly (e.g. sleep 1s) until an hour has passed.
  • read -t 3600, which waits up to 3600 seconds (one hour) for text to appear on standard input. (N.B. this approach only works if there is not, in fact, any text on standard input.)

(These rely on the fact that the loop (in the former case) or the read call (in the latter case) are part of the Bash process itself, rather than a forked child process.)

ruakh
  • 175,680
  • 26
  • 273
  • 307
  • I've modified my `fractional_sleep` procedure to use `sleep N & wait` as you suggested and that totally solved the issue without any additional works. Thanks! – e-pirate Feb 16 '20 at 11:37