2

I have a quite simple perl script, that in one function does the following:

    if ( legato_is_up() ) {
        write_log("INFO:        Legato is up and running. Continue the installation.");
        $wait_minutes = $WAITPERIOD + 1;
        $legato_up = 1;
    }
    else {
        my $towait = $WAITPERIOD - $wait_minutes;
        write_log("INFO:        Legato is not up yet. Waiting for another $towait minutes...");
        sleep 30;
        $wait_minutes = $wait_minutes + 0.5;
    }

For some reason, sometimes (like 1 in 3 runs) the script gets killed. I don't know who's responsible for the kill, I just know it happens during the "sleep" call.

Can anyone give me a hint here? After script is killed, it's job is not done, which is a big problem.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Alex
  • 897
  • 1
  • 11
  • 21
  • is that being executed from the console or from within apache ? if it is from apache take a look at the timeout option... also there could be error somewhere in your functions that you are not threating and since we dont see the rest of the code we can't really tell. If is is called from the browser and you are using apache or similar you could aswell check the error_log and see if it contains anything that might help you ... – Prix Jul 15 '10 at 19:56
  • it is being executed from the console. The script is a part of installation wrapper, which is used to install a simple software. The script runs last and it's purpose is to verify that software is up ( that's what legato_is_up() does). The OS is Red Hat AS3, with standard distribution. – Alex Jul 15 '10 at 20:01
  • Are you, by any chance, running this on Dreamhost? They have a tendency to kill persistent processes. – Schwern Jul 15 '10 at 20:35
  • @Schwern: no, it's a local system, not a web environment at all. – Alex Jul 15 '10 at 20:46
  • The big mystery here is what does legato_is_up() do, exactly? You say the script gets killed in its sleep but are you sure it isn't getting killed due to whatever code is in legato_is_up()? Does legato_is_up() write to the log so you know it has returned before the script dies? – T.Rob Jul 16 '10 at 02:24
  • @T.Rob: it checkes whether the Legato cluster is up with simple HAtools command. From logs that I've seen, the failure is right during the sleep() call. – Alex Jul 16 '10 at 05:03

2 Answers2

1

Without knowing what else is running on your system, it's anybody's guess. You could add a signal handler, but all that it would tell you is which signal it was (and when), but not who sent it:

foreach my $signal (qw(INT PIPE HUP))
{
    my $old_handler = $SIG{$signal};
    $SIG{$signal} = sub {
        print time, ": ", $signal, " received!\n";
        $old_handler->(@_) if $old_handler;
    };
}

You also may want to consider adding a WARN and DIE handler, if you are not logging output from stderr.

Ether
  • 53,118
  • 13
  • 86
  • 159
  • there's nothing else. I have a number of other scripts using sleep call, but only this one gets killed... – Alex Jul 15 '10 at 20:02
  • Isn't there a Perl module for advanced signal handling, which *can* tell you who sent the signal? – Zan Lynx Jul 15 '10 at 20:04
  • 2
    @Zan I don't think Unix makes that information available. Glancing through the GNU C Library docs on signal handling their handler just gets the signal number. http://www.gnu.org/s/libc/manual/html_node/Basic-Signal-Handling.html – Schwern Jul 15 '10 at 20:39
  • 1
    @Alex: I can guarantee you that there are other processes running on your system :) Perhaps not by you, but root always has several. – Ether Jul 15 '10 at 20:59
  • @Schwern: Look at the man page for sigaction. There's a big section on siginfo_t. One of the struct members is si_pid. You get this extra info with the SA_SIGINFO flag. – Zan Lynx Jul 15 '10 at 20:59
  • @Ether: oh yes, definitely, but I mean this is a normal system, nothing should kill the script. A script being always killed during its sleep() call and nothing else is strange, wouldn't you say? – Alex Jul 15 '10 at 21:26
  • @Alex: I would guardedly say yes (guardedly only because I know nothing about your system). Strange things can happen though; e.g. processes can mysteriously vanish if you run out of memory. – Ether Jul 15 '10 at 21:38
  • 1
    @Ether: you were right after all :-) there's something wacky on the system that kills almost all processes that have anything to do with Legato (the clustering software) - running scripts were killed, tail on system log was killed, tail on my log was killed - weird. Now I need to look for the way to do it without killing the system :-) – Alex Aug 01 '10 at 09:06
0

Under, at least Linux, you can see who sent a signal (if its an external process that used kill(2)) by looking at the siginfo struct (particularly si_pid) passed to a signal handler. I don't know how to see that from Perl however - but in your case you could strace (or similar on non-Linux platforms) your script and see it that way. e.g. strace -p <pid of your perl script>. You should see something like:

--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=89165, si_uid=1000} ---

just before your untimely death.

(a few years late for the OP I know...)

user133831
  • 590
  • 5
  • 13