2

I'm programming a perl script to monitorize a DB with Nagios. I'm using alarm function from Time::HiRes library for the timeout.

use Time::HiRes qw[ time alarm ];
alarm $timeout;

Everything works fine. The thing is I want to change the output message cause it returns "Temporizador" and if I do an

echo $?

Returns 142. I want to change the message in order to make an "exit 3" so it can be recognized by Nagios.

Already tried 'eval' but doesn't work.

  • 2
    `Temporizador` is output by your shell when one of its children is killed by the ARLM signal. `$?` is not the child's exit code here; it's the number of the signal that killed the child (14) ORed with 128. – ikegami Jun 15 '16 at 16:33

2 Answers2

3

You should handle the ALRM signal. For example:

#!/usr/bin/env perl
use strict;
use warnings;
use Time::HiRes qw[ time alarm ];

$SIG{ALRM} = sub {print "Custom message\n"; exit 3};

alarm 2;
sleep 10; # this line represents the rest of your program, don't include it

This will output:

18:08:20-eballes@urth:~/$ ./test.pl 
Custom message
18:08:23-eballes@urth:~/$ echo $?
3

For an extended explanation about handling signals check this nice tutorial on perltricks.

simbabque
  • 53,749
  • 8
  • 73
  • 136
LaintalAy
  • 1,162
  • 2
  • 15
  • 26
  • It is working. I have a question. Why sleep? I've checked if I put sleep 10 the alarm value can't be greater than 10. Why? – Adrian Blanco Jun 15 '16 at 16:20
  • This is just for the example. The `sleep` needs to be greater than the `alarm`. Otherwise the program ends and no `ALRM` signal would be received. – LaintalAy Jun 15 '16 at 16:24
  • 4
    @Adrian Blanco, `sleep` represents the rest of your program. Don't actually use `sleep`. – ikegami Jun 15 '16 at 16:37
  • @ikegami already noticed that, but if I don't write that line the timeout doesn't work and if I write it the rest of my program doesn't. ¿? – Adrian Blanco Jun 15 '16 at 17:45
  • @Adrian Blanco, of course it doesn't work with it. You tell the program to do nothing until it gets killed by the alarm. /// What do you mean by "the timeout doesn't work"? Also, what OS? – ikegami Jun 15 '16 at 17:48
  • @ikegami yes. I have a connection test DSN which should time out after the alarm value. It does if I put sleep but if don't it takes longer than the alarm length. – Adrian Blanco Jun 15 '16 at 17:52
  • Linux 2.6.32-504.1.3.el6.x86_64 CentOS 6.6. I mean that the alarm doesn't kill the program after the $timeout. – Adrian Blanco Jun 15 '16 at 18:03
  • 2
    @Adrian Blanco, Is the function you want to interrupt written in C? That's not going to happen. You can't safely interrupt calls to C functions (e.g. regex matches, XS functions). The handler should still be called, but it will be called after the routine has exited. However, you make it sound like the the handler isn't called at all. If so, either the program isn't running long enough to trigger the alarm, or something is clearing the alarm (e.g. by using `alarm` itself) or alarm handler (unlikely). – ikegami Jun 15 '16 at 18:03
  • Just making sure it wasn't Windows, since we're talking about unix signals. – ikegami Jun 15 '16 at 18:05
  • @ikegami the function I want to interrupt is `$dbh = DBI->connect("dbi:ODBC:dsn=$dsn",$user,$pass)` written in Perl. As mentioned in my first post, everything works fine if I only use `alarm $timeout`without `SIG{ALRM}`. I'd like to change the return code. – Adrian Blanco Jun 15 '16 at 18:11
  • 2
    @Adrian Blanco, Yup, that's the problem. The guts of that is a C function. I'll post a solution shortly. – ikegami Jun 15 '16 at 18:12
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/114772/discussion-between-adrian-blanco-and-ikegami). – Adrian Blanco Jun 15 '16 at 18:25
3

The function that's taking the time is written in C, which precludes you from using a custom signal handler safely.

You don't appear worried about terminating your program forcefully, so I suggest you use alarm without a signal handler to terminate your program forcefully if it takes too long to run, and using a wrapper to provide the correct response to Nagios.

Change

/path/to/program some args

to

/path/to/timeout_wrapper 30 /path/to/program some args

The following is timeout_wrapper:

#!/usr/bin/perl
use strict;
use warnings;

use POSIX       qw( WNOHANG );
use Time::HiRes qw( sleep time );

sub wait_for_child_to_complete {
   my ($pid, $timeout) = @_;
   my $wait_until = time + $timeout;
   while (time < $wait_until) {
      waitpid($pid, WNOHANG)
         and return $?;

      sleep(0.5);
   }

   return undef;
}

{
   my $timeout = shift(@ARGV);

   defined( my $pid = fork() )
      or exit(3);

   if (!$pid) {
      alarm($timeout);   # Optional. The parent will handle this anyway.
      exec(@ARGV)
         or exit(3);
   }

   my $timed_out = 0;
   my $rv = wait_for_child_to_complete($pid, $timeout);
   if (!defined($rv)) {
      $timed_out = 1;
      if (kill(ALRM => $pid)) {
         $rv = wait_for_child_to_complete($pid, 5);
         if (!defined($rv)) {
            kill(KILL => $pid)
         }
      }
   }

   exit(2) if $timed_out;
   exit(3) if $rv & 0x7F;  # Killed by some signal.
   exit($rv >> 8);         # Expect the exit code to comply with the spec.
}

Uses the Nagios Plugin Return Codes. Timeouts should actually return 2.

ikegami
  • 367,544
  • 15
  • 269
  • 518