1

I have a text file that's composed of fixed length records but all in one line with no line breaks in between. What's the best way to process it in Perl? Thanks!

brian d foy
  • 129,424
  • 31
  • 207
  • 592

3 Answers3

8

First, let's open the file, and make sure it's in bin mode:

open my $fh, '<', 'file.name' or die "Cannot open file.name: $!";
binmode $fh;

Now, set input record separator to reference to length of your records (let's assume 120 bytes per record):

local $/ = \120;

Now, let's read the records:

while (my $record = <$fh>) {

And now if you want to get data out of it, you have to write some unpack thing:

  my @elements = unpack("......", $record);

Now you can process @elements, and finish while() {} loop:

  ...
}

Whole "program":

open my $fh, '<', 'file.name' or die "Cannot open file.name: $!";
binmode $fh;
local $/ = \120;
while (my $record = <$fh>) {
  my @elements = unpack("......", $record);
  ...
}
close $fh;
  • why transparent? Perhaps you mean that $/ = \number is less known. That's true. But on the other hand it is very handy as you use the filehandle just like always. –  Aug 06 '09 at 17:41
  • sysread is more transparent because you know you are not reading a line but a fixed number of bytes. When you aren't processing lines, acting like you are makes the problem harder. A lot of binary formats don't have a consistent byte-length for the objects through the format, so you often read different number of bytes for each bit. – brian d foy Aug 07 '09 at 03:30
  • @brian d foy: sure. and if we would be dealing with variable-length records, I would write it in another way. but since this is clearly fixed-length, usage of $/ and standard <> seems easier. at least for me. –  Aug 07 '09 at 06:38
5

use the read FILEHANDLE,SCALAR,LENGTH function to read a block at a time into a buffer...

use constant LEN => 60;
while (!eof $fh) {
    my $len = read $fh, $buf, LEN;
    die "short read" if $len < LEN;
    # processing...
}

... and process the buffer using regular expressions, unpack, or however you like.

hillu
  • 9,423
  • 4
  • 26
  • 30
2

unpack() may be of use here. You can specify the list of characters (using 'c', 'C' or 'W') and it'll unpack automatically into a list. See the pack documentation for the options to use.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Brian Agnew
  • 268,207
  • 37
  • 334
  • 440