I have a bash script for the purpose of splitting up a huge input file -- at the moment it's 400MB, later the script should split a 4GB file.
The core or this processing is the following awk script:
INPUTFILE="FA.txt"
awk -F $'\t' 'BEGIN{
count = 1;
vcount = 1;
hcount = 1;
tmp = 0;
while (getline "'"$INPUTFILE"'")
{
FAv[count] = $1;
FAh[count] = FAv[count];
BK[count] = $2;
vBreak[count] = $3;
Count++;
}
close("'"$INPUTFILE"'");
}
{
str1 = sprintf("%s%s%s",FAv[vcount],"v",".txt");
str2 = sprintf("%s%s%s",FAh[hcount],"h",".txt");
if (NR > (vBreak[vcount+1]-1))
{
close(str1);
vcount ++;
}
if (($22-tmp) > BK[hcount])
{
close(str2);
tmp = BK[hcount];
hcount++;
}
printf "...\n",(many columns) >> str1;
printf "...\n",(many columns) >> str2;
}' Data.txt
Data.txt is a very big tab-separated table with about 40 columns and approximately 2.6 million lines; the file the script should handle later on would have about 30 million lines. The input file I am using right now should make about 300 files, the one the script is meant to process later should create about 4000 files.
The lines close(str1);
and close(str2);
don't change the error message I get which is
awk: (filename)h.txt makes too many open files
Input record number 157762, file Data.txt
source line number 7
awk: (filename)h.txt make too many open files
Input record number 157762, file Data.txt
source line number 10
The source line numbers given are the equivalent of them in the given snippet here, in my script they are at different positions.
The file "FA.txt" which is used to generate splitting conditions is 3KB big and has 155 lines and 3 columns so this shouldn't make any problem for awk at all. I am afraid I cant really give out dummy data as the data comes from a company I am working for.
I do not see where the problem in the code is located, any help would be greatly appreciated.