awk command to split an 8GB file into multiple files basis number of rows with new filename and header in each file
I have an 8GB file with 26 column headers. I have to split it into multiple files with each file having 400000 lakhs including header. which means each file should have the header as well.
I have tried multiple commands but even though I am getting the desired output there is one small problem but a weird one.
After the 1st line as the header,the header is inserted again at every 50000th line. For eg after using the below command, I got FileName_28062021_1.txt file. If I open this file I can see the header in 1st , 50001st,100001st,150001st lines: Not sure how to resolve it. Original Command tried:
awk '
NR==1{header=$0; count=1; print header > "FileName_28062021_" count ".txt"; next }
!( (NR-1) % 399999){count++; print header > "FileName_28062021_" count ".txt";}
{print $0 > "FileName_28062021_" count ".txt"}
' FileName_28062021-SourceFile.txt
SERVERIF:/data1/tempCheckAWK $ wc -l FileName_28062021-NonSplit.txt
46646575 FileName_28062021-NonSplit.txt
Second AWK command tried
SERVERIF:/data1/tempCheckAWK $ vi tempAWK.sh
awk '
NR==1 { header = $0 }
(NR % 400000) == 1 {
close(out)
out = "FileName_28062021_" (++count) ".txt"
print header > out
}
NR>1 { print > out }
' FileName_28062021-NonSplit.txt
SERVERIF:/data1/tempCheckAWK $ sh tempAWK.sh
SERVERIF:/data1/tempCheckAWK $ ls -ltr
Jun 10 13:43 FileName_28062021-NonSplit.txt
Jun 28 23:56 tempAWK.sh
Jun 28 23:59 FileName_28062021_1.txt
Jun 28 23:59 FileName_28062021_2.txt
....
SERVERIF:/data1/tempCheckAWK $wc -l FileName_28062021_1.txt
400000 FileName_28062021_1.txt
SERVERIF:/data1/tempCheckAWK $grep "Transactions Id" FileName_28062021_1.txt
Transactions Id|Transaction Date|Investment Id|External Code
Transactions Id|Transaction Date|Investment Id|External Code
Transactions Id|Transaction Date|Investment Id|External Code
Transactions Id|Transaction Date|Investment Id|External Code
Transactions Id|Transaction Date|Investment Id|External Code
Transactions Id|Transaction Date|Investment Id|External Code
Transactions Id|Transaction Date|Investment Id|External Code
Transactions Id|Transaction Date|Investment Id|External Code
I have tried other solutions provided on stackoverflow. Still no luck, the header repeats after it repeats after 50000th