Although this is pretty basic, I can't find a similar question, so please link to one if you know of an existing question/solution on SO.
I have a .txt
file that is about 2MB and about 16,000 lines long. Each record length is 160 characters with a blocking factor of 10. This is an older type of data structure which almost looks like a tab-delimited file, but the separation is by single-chars/white-spaces.
First, I glob
a directory for .txt
files - there is never more than one file in the directory at a time, so this attempt may be inefficient in itself.
my $txt_file = glob "/some/cheese/dir/*.txt";
Then I open the file with this line:
open (F, $txt_file) || die ("Could not open $txt_file");
As per the data dictionary for this file, I'm parsing each "field" out of each line using Perl's substr()
function within a while loop.
while ($line = <F>)
{
$nom_stat = substr($line,0,1);
$lname = substr($line,1,15);
$fname = substr($line,16,15);
$mname = substr($line,31,1);
$address = substr($line,32,30);
$city = substr($line,62,20);
$st = substr($line,82,2);
$zip = substr($line,84,5);
$lnum = substr($line,93,9);
$cl_rank = substr($line,108,4);
$ceeb = substr($line,112,6);
$county = substr($line,118,2);
$sex = substr($line,120,1);
$grant_type = substr($line,121,1);
$int_major = substr($line,122,3);
$acad_idx = substr($line,125,3);
$gpa = substr($line,128,5);
$hs_cl_size = substr($line,135,4);
}
This approach takes a lot of time to process each line and I'm wondering if there is a more efficient way of getting each field out of each line of the file.
Can anyone suggest a more efficient/preferred method?