6

I am trying to learn Perl 6 and parallelism/concurrency at the same time.

For a simple learning exercise, I have a folder of 550 '.htm' files and I want the total sum of lines of code among all of them. So far, I have this:

use v6;

my $start_time = now;
my $exception;
my $total_lines = 0;

my @files = "c:/testdir".IO.dir(test => / '.' htm $/);
for @files -> $file {
    $total_lines += $file.lines.elems;
    CATCH {
        default { $exception = $_; } #some of the files error out for malformed utf-8
    }
}
say $total_lines;
say now - $start_time;

That gives a sum of 577,449 in approximately 3 seconds.

How would I rewrite that to take advantage of Perl 6 parallelism ideas? I realize the time saved won't be much but it will work as proof of concept.

Scimon Proctor
  • 4,558
  • 23
  • 22
Herby
  • 79
  • 4
  • 3
    something like `my $total_lines = [+] @files.race.map(*.lines(:enc).elems)`, compared to the one without `.race`? – Christoph Dec 28 '15 at 17:50
  • Great. With .race it took approximately 2 seconds on average. Without .race, it takes 2.6 seconds on average. – Herby Dec 28 '15 at 18:05

2 Answers2

1

Implementing Christoph's suggestion. The count is slightly higher than my original post because I'm now able to read in the malformed UTF-8 files using encode latin1.

use v6;
my $start_time = now;

my @files = "c:/iforms/live".IO.dir(test => / '.' htm $/);
my $total_lines = [+] @files.race.map(*.lines(:enc<latin1>).elems);

say $total_lines;

say now - $start_time;
Christopher Bottoms
  • 11,218
  • 8
  • 50
  • 99
Herby
  • 79
  • 4
-1
use v6;
my $start_time = now;
my $exception;
my @lines;
my $total_lines = 0;

my @files = "c:/testdir".IO.dir(test => / '.' htm $/);
await do for @files -> $file {
    start {
        @lines.push( $file.lines.elems );
        CATCH {
           default { $exception = $_; } #some of the files error out for malformed utf-8
       }
    }
}
$total_lines = [+] @lines;
say $total_lines;
say now - $start_time;
Community
  • 1
  • 1
  • 1
    Thanks. Weirdly, when I run this code multiple times I get a variety of $total_lines. I ran it 5 times and got counts of: 575967, 575367, 570325, 574797, 576222. Any ideas as to what is causing that? – Herby Dec 28 '15 at 18:47
  • Not yet - I munged that from http://blogs.perl.org/users/pawel_bbkr_pabian/2015/09/asynchronous-parallel-and-dead-my-perl-6-daily-bread.html - and I'm still learning how perl6 all works. Don't know if you need any form of locking around that $total_lines update - maybe if you make the block return the number of lines, and sum over the totals outside of the await. map/reduce style. – Mike Chamberlain Dec 28 '15 at 19:02
  • Ok - tweaked it a little to use capture the elems in array which is summed outside of the parallel code. – Mike Chamberlain Dec 28 '15 at 19:19
  • Ran the updated code. Seeing $total_lines of: 573705, 577449, 575867, 577449, 575407. I'm with you, trying to piece it together from that blog.perl.org post you linked. – Herby Dec 28 '15 at 19:23
  • Which version of perl6 are you using? Mines giving a stable answer using moarVM 2015.12 (granted I've a limited number of files I'm running it over). – Mike Chamberlain Dec 28 '15 at 19:25
  • Windows 7 running perl6 version 2015.09 built on MoarVM version 2015.09. Copying/pasting your code directly into the script. – Herby Dec 28 '15 at 19:27
  • Upgrade to 2015.12 - that's the latest (official) release of perl 6. – Mike Chamberlain Dec 28 '15 at 19:48