I have a text file that's composed of fixed length records but all in one line with no line breaks in between. What's the best way to process it in Perl? Thanks!
Asked
Active
Viewed 1,488 times
3 Answers
8
First, let's open the file, and make sure it's in bin mode:
open my $fh, '<', 'file.name' or die "Cannot open file.name: $!";
binmode $fh;
Now, set input record separator to reference to length of your records (let's assume 120 bytes per record):
local $/ = \120;
Now, let's read the records:
while (my $record = <$fh>) {
And now if you want to get data out of it, you have to write some unpack thing:
my @elements = unpack("......", $record);
Now you can process @elements, and finish while() {} loop:
...
}
Whole "program":
open my $fh, '<', 'file.name' or die "Cannot open file.name: $!";
binmode $fh;
local $/ = \120;
while (my $record = <$fh>) {
my @elements = unpack("......", $record);
...
}
close $fh;
-
why transparent? Perhaps you mean that $/ = \number is less known. That's true. But on the other hand it is very handy as you use the filehandle just like always. – Aug 06 '09 at 17:41
-
sysread is more transparent because you know you are not reading a line but a fixed number of bytes. When you aren't processing lines, acting like you are makes the problem harder. A lot of binary formats don't have a consistent byte-length for the objects through the format, so you often read different number of bytes for each bit. – brian d foy Aug 07 '09 at 03:30
-
@brian d foy: sure. and if we would be dealing with variable-length records, I would write it in another way. but since this is clearly fixed-length, usage of $/ and standard <> seems easier. at least for me. – Aug 07 '09 at 06:38
5
use the read FILEHANDLE,SCALAR,LENGTH
function to read a block at a time into a buffer...
use constant LEN => 60;
while (!eof $fh) {
my $len = read $fh, $buf, LEN;
die "short read" if $len < LEN;
# processing...
}
... and process the buffer using regular expressions, unpack
, or however you like.

hillu
- 9,423
- 4
- 26
- 30
2
unpack() may be of use here. You can specify the list of characters (using 'c', 'C' or 'W') and it'll unpack automatically into a list. See the pack documentation for the options to use.

brian d foy
- 129,424
- 31
- 207
- 592

Brian Agnew
- 268,207
- 37
- 334
- 440