0

I'd like to invoke multiple perl instances/scripts from one perl script. Please see the simple script below which illustrates the problme nicely

my @filenames = {"file1.xml","file2.xml","file3.xml",file4.xml"}
foreach my $file (@filenames)
{   
    #Scripts which parses the XML file
    system("perl parse.pl $file");

    #Go-On don't wait till parse.pl has finished

}

As I'm on a quad-core CPU and the parsing of a single file takes a while, I want to split the Job. Could someone point me in a good direction?

Thanks and best, Tim

Tim Geig
  • 5
  • 2

1 Answers1

1

Taking advantage of multiple cores for implicitly parallel workloads has many ways to do it.

The most obvious is - suffix an ampersand after your system call, and it'll charge off and do it in the background.

my @filenames = ("file1.xml","file2.xml","file3.xml",file4.xml");
foreach my $file (@filenames)
{   
    #Scripts which parses the XML file
    system("perl parse.pl $file &");

    #Go-On don't wait till parse.pl has finished

}

That's pretty simplistic, but should do the trick. The downside of this approach is it doesn't scale too well - if you had a long list of files (say, 1000?) then they'd all kick off at once, and you may drain system resources and cause problems by doing it.

So if you want a more controlled approach - you can use either forking or threading. forking uses the C system call, and starts duplicate process instances.

use Parallel::ForkManager;
my $manager = Parallel::ForkManager -> new ( 4 ); #number of CPUs
my @filenames = ("file1.xml","file2.xml","file3.xml",file4.xml");
foreach my $file (@filenames)
{   
    #Scripts which parses the XML file
    $manager -> start and next; 
    exec("perl", "parse.pl", $file) or die "exec: $!";
    $manager -> finish; 

    #Go-On don't wait till parse.pl has finished

}

# and if you want to wait:
$manager -> wait_all_children(); 

And if you wanted to do something that involved capturing output and post-processing it, I'd be suggesting thinking in terms of threads and Thread::Queue. But that's unnecessary if there's no synchronisation required.

(If you're thinking that might be useful, I'll offer: Perl daemonize with child daemons)

Edit: Amended based on comments. Ikegami correctly points out:

system("perl parse.pl $file"); $manager->finish; is wasteful (three processes per worker). Use: exec("perl", "parse.pl", $file) or die "exec: $!"; (one process per worker).

Community
  • 1
  • 1
Sobrique
  • 52,974
  • 7
  • 60
  • 101
  • 2
    `system("perl parse.pl $file"); $manager->finish;` is wasteful (three processes per worker). Use: `exec("perl", "parse.pl", $file) or die "exec: $!";` (one process per worker). – ikegami Mar 02 '15 at 12:54