4

The issue I have can be found by running the following code in Strawberry perl 5.12.3.0 on Windows XP.

    #!/usr/bin/perl -w

    use strict;
    use warnings;
    use Win32::Unicode::File;
    use Encode;

    my $fname = shift @ARGV;

    my $fh = Win32::Unicode::File->new;
    if ($fh->open('<', $fname)){
      while (my $line = $fh->readline()){}
      close $fh;
    }else{
      print "Couldn't open file: $!\n";
    }

The only thing that is happening here is that I perform a readline and this keeps eating memory until I get an Out of memory error from Strawberry perl. I am using a really big file but since this code is stream based it shouldn't matter. Am I missing something here or is there a leak somewhere in Strawberry perl? I tested the exactly same code in ActivePerl and there it works fine, i.e., it doesn't eat memory.

Update: Replacing Win32::Unicode::File with the normal diamond operator seems to work on my distribution at least. See the following code.

    use strict;
    use warnings;

    my $fname = shift @ARGV;

    if (open(my $fh, '<', $fname)){
      while (my $line = <$fh>){}
      close $fh;
    }else{ print "Couldn't open file: $!\n";}

So that would suggest the problem lies with Win32::Unicode module right?

Dr. Mike
  • 2,451
  • 4
  • 24
  • 36
  • 1
    Does the file have line breaks? Attempting to read a line could end up reading the entire file. – ikegami Jan 03 '12 at 10:56
  • 1
    @ikegami yes the file has line breaks and each line is no more than 255 characters long. – Dr. Mike Jan 03 '12 at 12:34
  • 1
    I would suggest putting a print statement for each line read instead of an empty block, so you can see that lines are really being read. Just print "." would do. Also, make sure you really do have $line and not @line which would read the entire file. – Bill Ruppert Jan 03 '12 at 12:54
  • 1
    @BillRuppert I have done that and the file is read and parsed properly. The code above is just to show you the core issue. – Dr. Mike Jan 03 '12 at 13:53
  • 1
    I can reproduce the problem on Citrus Perl 5.12.3 on a 600+Mb file. It even crashes when using the regular diamond operator instead of Win32::Unicode::File. – Stamm Jan 04 '12 at 10:21
  • 1
    @Stamm I didn't try that, but then at least that would clear the Win32::Unicode::File module from the memory leak. Any ideas as to why this leak occurs? I'm at my wits end. – Dr. Mike Jan 04 '12 at 11:30
  • 1
    @Stamm I just tried the following code: use strict; use warnings; my $fname = shift @ARGV; if (open(my $fh, '<', $fname)){ while (my $line = <$fh>){} close $fh; }else{ print "Couldn't open file: $!\n";} and it didn't give me any problems. – Dr. Mike Jan 04 '12 at 12:12
  • 1
    Sorry, I messed up my earlier tests. Indeed, it works as expected with the diamond operator. And it is 50 times faster, too. So Win32::Unicode::File is the culprit. – Stamm Jan 04 '12 at 13:39

2 Answers2

1

Maybe $/ (or $INPUT_RECORD_SEPARATOR) is not a new line? Or $[ (index of first array element and first character in a (sub)string) is not 0.

Those two vars are used by the module during read or readline.

BTW: It's so damn slow because it uses 3 function calls to reads each line one character at a time and then calls Encode::decode for each read character and then adds it to the line buffer that readline returns to your code. Yuck!

Chris H
  • 46
  • 1
1

A little unorthodox I guess, but I'm going to answer my own question. I have replaced the Win32::Unicode::File package with the Path::Class::Unicode package instead for reading the unicode file. This works fine (i.e. no memory eating) so it seems like the problem is in the Win32::Unicode::File package and is most likely a bug. I have contacted the author of the package and he's looking into it. Please let me know if you want me to supply the code. It's pretty straightforward.

Dr. Mike
  • 2,451
  • 4
  • 24
  • 36