1

This question is an extension to my previous question.

The only addition here is that now I would like the threads to timeout if the operation is not complete within the set timeout duration.

So after going through the perldoc and certain examples, I have drafted a working version and it works - but there is something off.

The timeout doesn't exactly happen in the set duration but sometimes takes double the amount of time.

I am not able to investigate this and I need you help on the same on how to investigate this further.

In this program, I have set the timeout value to 3 seconds via alarm function and have installed signal handler in the thread function.

My observations during the multiple runs are as follows:

  1. timeout never happens below the set value.
  2. timeout happens always above the set value and its value is never exactly the set value.

Please find the program output & the program itself for your review. Your comments and feedback are welcome.

Program output:

(20:51:59) $:little-stuff>  perl 10-20190530-batch-processing-using-threads-with-busy-pool-detection-2-with-threads-timeout.pl 12
29872: STARTING TASKS IN BATCHES
29872: MAIN: engine (e1) is RUNNING batch #1 tasks: (1 2)
29872: MAIN: engine (e2) is RUNNING batch #2 tasks: (3 4 5)
29872: MAIN: engine (e3) is RUNNING batch #3 tasks: (6 7)
29878: TASK_RUNNER: engine (e3) finished batch #3 task #7 in (1.27) seconds
29878: TASK_RUNNER: engine (e3) finished batch #3 task #6 in (2.12) seconds
29876: TASK_RUNNER: engine (e1) finished batch #1 task #1 in (2.97) seconds
29878: TASK_ORCHESTRATOR: engine (e3) finished batch #3 tasks in (2.00) seconds.
29872: REAPER: TASK_ORCHESTRATOR pid (29878) has finished with status (0).
29877: TASK_RUNNER: engine (e2) finished batch #2 task #4 in (3.65) seconds
29877: TASK_RUNNER: engine (e2) finished batch #2 task #3 in (4.62) seconds
29872: MAIN: engine (e3) is RUNNING batch #4 tasks: (8 9)
29876: TASK_RUNNER: engine (e1) finished batch #1 task #2 in (6.43) seconds
29876: TASK_ORCHESTRATOR: engine (e1) finished batch #1 tasks in (6.00) seconds.
29872: REAPER: TASK_ORCHESTRATOR pid (29876) has finished with status (0).
29872: MAIN: engine (e1) is RUNNING batch #5 tasks: (10 11)
29877: TASK_RUNNER: engine (e2), batch #2 task #5 has TIMED OUT in (6.00) seconds !!
29877: TASK_ORCHESTRATOR: engine (e2) finished batch #2 tasks in (6.00) seconds.
29872: REAPER: TASK_ORCHESTRATOR pid (29877) has finished with status (0).
29872: MAIN: engine (e2) is RUNNING batch #6 tasks: (12)
30000: TASK_RUNNER: engine (e3) finished batch #4 task #8 in (5.28) seconds
30059: TASK_RUNNER: engine (e1) finished batch #5 task #10 in (3.02) seconds
30059: TASK_RUNNER: engine (e1) finished batch #5 task #11 in (3.58) seconds
30059: TASK_ORCHESTRATOR: engine (e1) finished batch #5 tasks in (3.00) seconds.
29872: REAPER: TASK_ORCHESTRATOR pid (30059) has finished with status (0).
30000: TASK_RUNNER: engine (e3), batch #4 task #9 has TIMED OUT in (6.00) seconds !!
30000: TASK_ORCHESTRATOR: engine (e3) finished batch #4 tasks in (6.00) seconds.
29872: REAPER: TASK_ORCHESTRATOR pid (30000) has finished with status (0).
30129: TASK_RUNNER: engine (e2) finished batch #6 task #12 in (3.95) seconds
30129: TASK_ORCHESTRATOR: engine (e2) finished batch #6 tasks in (3.00) seconds.
29872: REAPER: TASK_ORCHESTRATOR pid (30129) has finished with status (0).
29872: ALL ORCHESTRATORS HAVE FINISHED
29872: FINISHED TASKS IN BATCHES

Program:

#!/usr/bin/env perl

use strict;
use warnings;
use Data::Dumper;
use POSIX ':sys_wait_h';
use Thread qw(async);

STDOUT->autoflush(1);

my $timeout = 3;
my @total_tasks = (1 .. shift || 10);
my $sleep_time = 6; 
my @engines = (qw/e1 e2 e3/);
my $sizes = { e1 => 2, e2 => 3, e3 => 2, };

my $proc_hash;
my $global_string = "engine";

### 
# source: https://duyanghao.github.io/ways_avoid_zombie_process/
#
  sub REAPER {
    local ($!, $?);
    while ( (my $reaped_pid = waitpid(-1, WNOHANG)) > 0 ) {
      if ( WIFEXITED($?) )
      {
        my $ret_code = WEXITSTATUS($?);
        print "$$: REAPER: TASK_ORCHESTRATOR pid ($reaped_pid) has finished with status ($ret_code).\n";
        my $engine_name = $proc_hash->{$reaped_pid};
        delete ($proc_hash->{$reaped_pid});
        delete ($proc_hash->{$engine_name});
      }
    }
  }
#
###

$SIG{CHLD} = \&REAPER;

sub random_sleep_time { return sprintf ("%.2f",(rand ($sleep_time) + 1)) }

sub task_runner {
  my @args = @_;

  STDOUT->autoflush(1);
  my ($batch_engine, $task) = ($args[0]->[0],$args[0]->[1]);
  my ($batch, $engine) = split (/_/,$batch_engine);

  my $start_time = time;
  #my $end_time = undef;
  #my $tot_time = undef;

  $SIG{ALRM} = sub {
    my $end_time = time;
    my $tot_time = sprintf ("%.2f",($end_time - $start_time));
    print "$$: TASK_RUNNER: $global_string ($engine), batch #$batch".
          " task #$task has TIMED OUT in ($tot_time) seconds !!\n";
    threads->exit(0);
  };

  my $task_time = random_sleep_time();
  sleep ($task_time);
  #sleep (random_sleep_time());
  #$end_time = time;
  #$tot_time = sprintf ("%.2f",($end_time - $start_time));
  #print "$$: TASK_RUNNER: $global_string ($engine) finished batch #$batch task #$task in ($tot_time) seconds\n";
  print "$$: TASK_RUNNER: $global_string ($engine) finished batch #$batch task #$task in ($task_time) seconds\n";
  threads->exit(0);
};

sub task_orchestrator {
  my ($batch_engine, @tasks) = @_;
  my ($batch, $engine) = split (/_/,$batch_engine);
  my $task_orch_pid = fork();
  die "Failed to fork task_orchestrator\n" if not defined $task_orch_pid;

  if ($task_orch_pid != 0) {
    $proc_hash->{$engine} = $task_orch_pid;
    $proc_hash->{$task_orch_pid} = $engine;
  }

  if ($task_orch_pid == 0) {
    my @tids;
    alarm ($timeout);
    STDOUT->autoflush(1);
    my $start_time = time;
    for (my $i=1 ; $i <= $#tasks ; $i++) { push (@tids,$i) }
    foreach my $task_number (0 .. $#tasks) {
      $tids [$task_number] = threads->create (
        \&task_runner,[$batch_engine,$tasks [$task_number]]
      );
    }

    $SIG{ALRM} = sub { 
      foreach my $t (@tids) {
        if ($t->is_running()) { $t->kill('ALRM') } 
      }
    };

    foreach my $tid (@tids) {$tid->join()}
    my $end_time = time;
    my $tot_time = sprintf ("%.2f",($end_time - $start_time));
    print "$$: TASK_ORCHESTRATOR: engine ($engine) finished batch #$batch tasks in ($tot_time) seconds.\n";
    exit (0);
  }
}

my $batch=1;
print "$$: STARTING TASKS IN BATCHES\n";
while (@total_tasks)
{
  foreach my $engine (@engines)
  {
    if (exists $proc_hash->{$engine})
    {
      sleep (1);
      next;
    }
    else
    {
      my @engine_tasks;
      my $engine_max_tasks = $sizes->{$engine};
      while ($engine_max_tasks-- != 0)
      {
        my $task = shift @total_tasks;
        push (@engine_tasks,$task) if $task;
      }
      if (@engine_tasks)
      {
        my $batch_engine = $batch.'_'.$engine;
        print "$$: MAIN: engine ($engine) is RUNNING batch #$batch tasks: (@engine_tasks)\n";
        task_orchestrator ("$batch_engine",@engine_tasks);
        $batch++;
      }
    }
  }
}

# All Work fine
#sleep (.2) while ((waitpid(-1, WNOHANG)) >= 0);
#sleep (.2) while ((waitpid(-1, WNOHANG)) != -1);
sleep (.2) while ((waitpid(-1, WNOHANG)) > -1);
# All Work fine

print "$$: ALL ORCHESTRATORS HAVE FINISHED\n";
print "$$: FINISHED TASKS IN BATCHES\n";
__END__
User9102d82
  • 1,172
  • 9
  • 19

1 Answers1

2

The timing of alarm() is not guaranteed to have sub-second accuracy. The standard Time::HiRes module offers ualarm() and a re-implementation of alarm() in terms of ualarm() which may be closer to what you want. Solving your problem may be as simple as adding

use Time::HiRes qw(alarm);

to the top of your program.

TFBW
  • 989
  • 7
  • 12
  • Thanks @TFBW. So do i write ualarm(3) instead of alarm(3) ? – User9102d82 May 30 '19 at 17:07
  • Their alarm is set at 3 (three), the integer, and this won't help at all. But their `sleep` _is_ fractional. – zdim May 30 '19 at 21:41
  • @User9102d82 Once you add the `use Time::HiRes qw(alarm sleep);` the program will use module's routines, which take fractional seconds. So there's nothing else to do, just add that use startement -- and you need it for `sleep`, not for `alarm`. (But once you are at it, why not include the alarm as well.) I don't know that that will fix it all though. – zdim May 30 '19 at 21:43
  • Tasks are not timing out at all. And even if they do timeout, timeout never occurs at the specified value. This seems to be not working. :( – User9102d82 May 31 '19 at 01:55
  • Sorry but tasks are not timing out at all. In fact it is very strange to understand what is going on as sometimes it times out and sometimes it doesn't. – User9102d82 May 31 '19 at 02:26
  • @User9102d82 Alright, thanks for feedback. I'll try, with simplified code (there's too much going on here, even as it's nicely organized). You do mix alarm and sleep (it's a no-no) but I can't tell whether that's it. One also normally doesn't do a timer quite like that but again I can't tell whether that's a problem. (Your code runs for me, and it doesn't time out.) – zdim May 31 '19 at 05:35
  • @zdim: I have been doing some reading and there is a lot of chatter coming up about perl threads and signals are a bad idea. I am now trying to hack my way through it but still to no avail. Could you please tell me more about mixing alarm with sleep ? – User9102d82 May 31 '19 at 06:01
  • @User9102d82 (1) Can't use signals on threads (a _process_ is signalled), so use threads' way to signal them around (2) sleep is implemented using alarm. mixing them can mess things up // The alarm is working in my simple fork+threads test, so far the issue being how to quit threads cleanly... – zdim May 31 '19 at 06:13
  • Okay. I am using 'threads' instead of 'Threads' which is seen in the program. Moreover, I guess we are outta luck as perldoc states that: Correspondingly, sending a signal to a thread does not disrupt the operation the thread is currently working on: The signal will be acted upon after the current operation @zdim:has completed. For instance, if the thread is stuck on an I/O call, sending it a signal will not cause the I/O call to be interrupted such that the signal is acted up immediately. It is not able to immediately do the timeout and hence the timeout is happening at different times. – User9102d82 May 31 '19 at 06:15
  • @User9102d82 (1) The threads' way to singal around, which you are using, is fine -- it is meant for that. I don't think that singlas are the problem. – zdim May 31 '19 at 06:17
  • @zdim: But what about the statement that .. "sending the signal doesn't disrupt the operation, signal will be acted upon after the operation has completed..", This is making me feel that if I want to timeout a certain operation - it won't happen as the signal will be acted upon later only. So how is that useful to us as users who want to perform timeout immediately which happens in C and/or C++. I am now checking Inline::C to see if there could be another work around or something, or may be develop a hacky way to keep track of each thread's start time and send cancel when timeout reached. – User9102d82 May 31 '19 at 06:22
  • @User9102d82 That refers to "safe" (deferred) singals, the way they have been done in Perl for many years. It normally means milliseconds, so that some I/O finishes; it won't lock up your program. As for threads, again, you're fine with what you are doing; see ["Thread Signalling" in threads](https://perldoc.perl.org/threads.html#THREAD-SIGNALLING). – zdim May 31 '19 at 06:28
  • @User9102d82 I'm having problems with getting my (more realistic) alarm+threads tests to work correctly; will be back at it tomorrow night (it's late at night here). – zdim May 31 '19 at 08:43
  • @zdim: no problem. Just a tip: instead of sleep (1) - a busy while (1) {;} is a better choice and I see timeouts happening properly. But still I am looking for issues. Will update again. Take care. – User9102d82 May 31 '19 at 09:00
  • @User9102d82 "_sending the signal doesn't disrupt the operation_" -- I mis-identified the doc page from which this is (and sent you a link to it! sorry) ... it seems that they're serious with it in this case ... – zdim Jun 01 '19 at 07:30