0

I have a perl program that takes over 13 hours to run. I think it could benefit from introducing multithreading but I have never done this before and I'm at a loss as to how to begin.

Here is my situation: I have a directory of hundreds of text files. I loop through every file in the directory using a basic for loop and do some processing (text processing on the file itself, calling an outside program on the file, and compressing it). When complete I move on to the next file. I continue this way doing each file, one after the other, in a serial fashion. The files are completely independent from each other and the process returns no values (other than success/failure codes) so this seems like a good candidate for multithreading.

My questions:

  1. How do I rewrite my basic loop to take advantage of threads? There appear to be several moduals for threading out there.
  2. How do I control how many threads are currently running? If I have N cores available, how do I limit the number of threads to N or N - n?
  3. Do I need to manage the thread count manually or will Perl do that for me?

Any advice would be much appreciated.

craigm
  • 137
  • 6
  • 2
    Grab the list of files, then use a Parallel::ForkManager loop in which the processor is launched using `exec`. – ikegami Nov 18 '14 at 15:18
  • If your program is IO-bound (and it sounds like it might be), then multithreading is not going to speed up your program. It might actually slow it down! – AKHolland Nov 18 '14 at 15:18
  • 1
    @AKHolland, File compression is usually CPU bound – ikegami Nov 18 '14 at 15:19
  • @ikegami It depends, and is certainly worth doing some profiling before diving into rewriting his program. – AKHolland Nov 18 '14 at 15:21
  • @AKHolland, Profilling? You mean benchmarking. Hard to do accurately because of caching, but the following would give an idea: `time bash -c 'extprog file1; extprog file2'` vs `time bash -c 'extprog file1 & extprog file2'` – ikegami Nov 18 '14 at 15:28
  • what OS? unless it's windows, there is no point in trying to do it with threads instead of processes (e.g. Parallel::ForkManager) (and arguably even on windows, there is little point) – ysth Nov 18 '14 at 15:30
  • Based on what I've seen when the program is running, I'm fairly confident the program is CPU bound. Oh and the OS is Windows (64-bit). – craigm Nov 18 '14 at 15:32
  • @ikegami No I mean profiling, for example NYTProf – AKHolland Nov 18 '14 at 16:20
  • http://stackoverflow.com/questions/26296206/perl-daemonize-with-child-daemons/26297240#26297240 – Sobrique Nov 18 '14 at 16:23
  • @AKHolland, Profiling is the wrong tool. It won't tell you how parallelizable something is. You need to benchmark to determine that. – ikegami Nov 18 '14 at 16:51
  • @ikegami Sure it will. I can look per-instruction to see where it's spending the most time. If it spends most of it's time reading and writing to filehandles, there you go. If it spends most of its time in compression modules, that's another answer. – AKHolland Nov 18 '14 at 18:57
  • @AKHolland, Nope, you get the same answer in both cases: "Don't know". – ikegami Nov 18 '14 at 18:59
  • @ikegami You are wrong here I'm moving on. – AKHolland Nov 18 '14 at 19:14
  • @AKHolland, Why? Because you say so? You have yet to provide any explanation as to how profiling would help. Parallelization could help if it spends most of its time reading and writing to filehandles, or it might not. Parallelization could help if it spends most of its time spends most of its time in compression modules, or it might not. So if that's all that profiling tells you, how does it help determine whether parllelization would be useful? Do you have anything at all to support your claim? – ikegami Nov 18 '14 at 19:24

2 Answers2

4

Since your threads are simply going to launch a process and wait for it to end, best to bypass the middlemen and just use processes. Unless you're on a Windows system, I'd recommend Parallel::ForkManager for your scenario.

use Parallel::ForkManager qw( );

use constant MAX_PROCESSES => ...;

my $pm = Parallel::ForkManager->new(MAX_PROCESSES);

my @qfns = ...;

for my $qfn (@qfns) {
   my $pid = $pm->start and next;
   exec("extprog", $qfn)
      or die $!;
}

$pm->wait_all_children();

If you wanted you avoid using needless intermediary threads in Windows, you'd have to use something akin to the following:

use constant MAX_PROCESSES => ...;

my @qfns = ...;

my %children;
for my $qfn (@qfns) {
   while (keys(%children) >= MAX_PROCESSES) {
      my $pid = wait();
      delete $children{$pid};
   }

   my $pid = system(1, "extprog", $qfn);
   ++$children{$pid};
}

while (keys(%children)) {
   my $pid = wait();
   delete $children{$pid};
}
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • Thanks much. It is a windows OS unfortunately since the external program I'm calling is Windows based. I can't say I completely understand your comment about Parallel:ForkManager and it's performance on Windows but it sounds like it might still be an option for my situation. I will give it a whirl. Many thanks... – craigm Nov 18 '14 at 15:42
  • Threads when using windows? Btw, is there some significant difference between forks and Parallel::ForkManager? – mpapec Nov 18 '14 at 16:10
  • I've had some decent performance improvements on Windows using Parallel::ForkManager. Especially with a bulk copying program I wrote. I highly recommend it. The biggest problem with Windows is... Windows (apologies, I'm a *nix groupie) – thonnor Nov 18 '14 at 16:12
  • 2
    Parallel::ForkManager isn't significantly different, but it does iron out a few of the gotchas (like cascading forks). – Sobrique Nov 18 '14 at 16:22
  • @Sobrique, you might be interested in the update to my answer. – ikegami Nov 18 '14 at 17:11
0

Someone's given your a forking example. Forks aren't native on Windows, so I'd tend to prefer threading.

For the sake of completeness - here's a rough idea of how threading works (and IMO is one of the better approaches, rather than respawning threads).

#!/usr/bin/perl

use strict;
use warnings;

use threads;

use Thread::Queue;

my $nthreads = 5;

my $process_q = Thread::Queue->new();
my $failed_q  = Thread::Queue->new();

#this is a subroutine, but that runs 'as a thread'.
#when it starts, it inherits the program state 'as is'. E.g.
#the variable declarations above all apply - but changes to
#values within the program are 'thread local' unless the
#variable is defined as 'shared'.
#Behind the scenes - Thread::Queue are 'shared' arrays.

sub worker {
    #NB - this will sit a loop indefinitely, until you close the queue.
    #using $process_q -> end
    #we do this once we've queued all the things we want to process
    #and the sub completes and exits neatly.
    #however if you _don't_ end it, this will sit waiting forever.
    while ( my $server = $process_q->dequeue() ) {
        chomp($server);
        print threads->self()->tid() . ": pinging $server\n";
        my $result = `/bin/ping -c 1 $server`;
        if ($?) { $failed_q->enqueue($server) }
        print $result;
    }
}

#insert tasks into thread queue.
open( my $input_fh, "<", "server_list" ) or die $!;
$process_q->enqueue(<$input_fh>);
close($input_fh);

#we 'end' process_q  - when we do, no more items may be inserted,
#and 'dequeue' returns 'undefined' when the queue is emptied.
#this means our worker threads (in their 'while' loop) will then exit.
$process_q->end();

#start some threads
for ( 1 .. $nthreads ) {
    threads->create( \&worker );
}

#Wait for threads to all finish processing.
foreach my $thr ( threads->list() ) {
    $thr->join();
}

#collate results. ('synchronise' operation)
while ( my $server = $failed_q->dequeue_nb() ) {
    print "$server failed to ping\n";
}

If you need to move complicated data structures around, I'd recommend having a look at Storable - specifically freeze and thaw. These will let you shuffle around objects, hashes, arrays etc. easily in queues.

Note though - for any parallel processing option, you get good CPU utilisation, but you don't get more disk IO - that's often a limiting factor.

Sobrique
  • 52,974
  • 7
  • 60
  • 101
  • `:shared` should perform better than Storable. – mpapec Nov 18 '14 at 16:34
  • It probably would, but I find it gets a bit unpleasant when it comes to nested hashes and objects. – Sobrique Nov 18 '14 at 16:42
  • huh, `fork` creates a thread on Windows, so your logic makes very little sense. Avoiding the repeated creation of threads is the performance saving to which I was alluding, but it should be pretty light in the OP's program. – ikegami Nov 18 '14 at 16:47
  • 2
    @mpapec, Thread::Queue `share`s the value, but that's no good for blessed variables. That's when you use Storable. If you want to use Storable, use Thread::Queue::Any instead of Thread::Queue as it Storable to stringify queued values. – ikegami Nov 18 '14 at 16:57
  • Haven't run into Thread::Queue::Any before. Useful to know, thanks. – Sobrique Nov 18 '14 at 17:00
  • @ikegami is there a way to share a socket? – mpapec Nov 18 '14 at 17:02
  • 1
    @mpapec, Between threads in a process: You could transfer the file descriptor number and reopen it in the other thread. (The tricky part is making sure the sender doesn't close it before the receiver reopens it.) Between processes: Aside from parent to child inheritance, some unix have a system call that can send a file handle from one process to another (`sendmsg`? can't remember). I don't know if Windows has something similar. – ikegami Nov 18 '14 at 17:07
  • @ikegami can you recommend cpan module? – mpapec Nov 18 '14 at 17:11
  • @mpapec, To share a socket between threads? I don't see how a CPAN would help since you'd normally want to integrate using your existing channel. We're just talking about `fileno()` and `open '+>&='`. Like I mentioned, the tricky part is keeping the handle alive in the sender longer enough. – ikegami Nov 18 '14 at 17:12
  • @ikegami if I want to pass socket to worker, does main thread need to close it before worker can use it? – mpapec Nov 18 '14 at 17:16
  • 1
    @mpapec, No. You can have two Perl handles use the same system handle, and you can have two system handles that are dups of each other (e.g. a child's STDOUT is often a dup of its parent's). In both cases, both handles are useable, subject to the collisions you'd expect if both try to use it at the same time. – ikegami Nov 18 '14 at 17:23
  • @Sobrique me too. `;)` – mpapec Nov 18 '14 at 18:34