1

I'm trying to implement a multithreaded application based on a slightly altered boss/worker model. Basically the main thread creates several boss threads, which in turn spawn two worker threads each (possibly more). That's because the boss threads deal with one host or network device each, and the worker threads could take a while to complete their work.

I'm using Thread::Pool to realize this concept, and so far it works quite well; I also don't think my problem is related to Thread::Pool (see below). Very simplified pseudocode ahead:

use strict;
use warnings;

my $bosspool = create_bosspool();   # spawns all boss threads
my $taskpool = undef;               # created in each boss thread at
                                    # creation of each boss thread 

# give device jobs to boss threads
while (1) {
  foreach my $device ( @devices ) {
    $bosspool->job($device);
  }

  sleep(1);
}

# This sub is called for jobs passed to the $bosspool
sub process_boss
{
  my $device = shift;

  foreach my $task ( $device->{tasks} ) {
    # process results as they become available
    process_result() while ( $taskpool->results );
    # give task jobs to task threads
    scalar $taskpool->job($device, $task);
    sleep(1); ### HACK ###
  }

  # process remaining results / wait for all tasks to finish
  process_result() while ( $taskpool->results || $taskpool->todo );

  # happy result processing
}

sub process_result
{
  my $result = $taskpool->result_any();

  # mangle $result
}

# This sub is called for jobs passed to the $taskpool of each boss thread
sub process_task
{
  # not so important stuff

  return $result;
}

By the way, the reason I'm not using the monitor()-routine is because I have to wait for all jobs in the $taskpool to finish. Now, this code works just wonderful, unless you remove the ### HACK ### line. Without sleeping, $taskpool->todo() won't deliver the right number of jobs still open if you add them or receive their results too "fast". Like, you add 4 jobs in total but $taskpool->todo() will only return 2 afterwards (with no pending results). This leads to all sorts of interesting effects.

OK, so Thread::Pool->todo() is crap, let's try a workaround:

sub process_boss
{
  my $device = shift;

  my $todo = 0;

  foreach my $task ( $device->{tasks} ) {
    # process results as they become available
    while ( $taskpool->results ) {
      process_result();
      $todo--;
    }
    # give task jobs to task threads
    scalar $taskpool->job($device, $task);
    $todo++;
  }

  # process remaining results / wait for all tasks to finish
  while ( $todo ) {
    process_result();
    sleep(1); ### HACK ###
    $todo--;
  }
}

This will also work fine, as long as I keep the ### HACK ### line. Without this line, this code will reproduce the problems of Thread::Pool->todo(), as $todo does not only get decremented by 1, but 2 or even more.

I've tested this code with only one boss thread, so there was basically no multithreading involved (when it comes to this subroutine). $bosspool, $taskpool and especially $todo aren't :shared, no side effects possible, right? What's happening in this subroutine, which gets executed by only one boss thread, with no shared variables, semaphores, etc.?

opx
  • 103
  • 4
  • 5
    Can you create a simple complete piece of example code that demonstrates the problem? It is hard to evaluate what is going on here, because we can't see key parts of your program. –  Feb 27 '13 at 14:19
  • did you try with real data or did you build a test setup first? – didierc Feb 28 '13 at 12:28

1 Answers1

0

I would suggest that the best way to implement a 'worker' threads model, is with Thread::Queue. The problem with doing something like this, is figuring out when queues are complete, or whether items are dequeued and pending processing.

With Thread::Queue you can use a while loop to fetch elements from the queue, and end the queue, such that the while loop returns undef and the threads exit.

So you don't always need multiple 'boss' threads, you can just use multiple different flavours of worker and input queues. I would question why you need a 'boss' thread model in that instance. It seems unnecessary.

With reference to: Perl daemonize with child daemons

#!/usr/bin/perl

use strict;
use warnings;
use threads;
use Thread::Queue;

my $nthreads = 4;

my @targets = qw ( device1 device2 device3 device4 );

my $task_one_q = Thread::Queue->new();
my $task_two_q = Thread::Queue->new();

my $results_q = Thread::Queue->new();

sub task_one_worker {
    while ( my $item = task_one_q->dequeue ) {

        #do something with $item

        $results_q->enqueue("$item task_one complete");
    }
}

sub task_two_worker {
    while ( my $item = task_two_q->dequeue ) {

        #do something with $item

        $results_q->enqueue("$item task_two complete");
    }
}

#start threads;

for ( 1 .. $nthreads ) {
    threads->create( \&task_one_worker );
    threads->create( \&task_two_worker );
}

foreach my $target (@targets) {
    $task_one_q->enqueue($target);
    $task_two_q->enqueue($target);
}

$task_one_q->end;
$task_two_q->end;

#Wait for threads to exit.

foreach my $thr ( threads->list() ) {
    threads->join();
}

$results_q->end();

while ( my $item = $results_q->dequeue() ) {
    print $item, "\n";
}

You could do something similar with a boss thread if you were desirous - you can create a queue per boss and pass it by reference to the workers. I'm not sure that it's necessary though.

Community
  • 1
  • 1
Sobrique
  • 52,974
  • 7
  • 60
  • 101