I am writing a script that must loop, each loop different scripts pull variables from external files and the last step compiles them. I am trying to maximize the speed at which this loop can run, and thus trying to find the best programs for the job.
The rate limiting step right now is searching through a file which has 2 columns and 4.5 million lines. column one is a key and column 2 is the value I am extracting.
The two programs I am evaluating are awk and grep. I have put the two scripts and their run times to find the last value below.
time awk -v a=15 'BEGIN{B=10000000}$1==a{print $2;B=NR}NR>B{exit}' infile
T
real 0m2.255s
user 0m2.237s
sys 0m0.018s
time grep "^15 " infile |cut -d " " -f 2
T
real 0m0.164s
user 0m0.127s
sys 0m0.037s
This brings me to my question... how does grep search. I understand awk runs line by line and field by field, which is why it takes longer as the file gets longer and i have to search further into it.
how does grep search? Clearly not line by line, or if it is it's clearly in a much different manner than awk considering the almost 20x time difference.
(I have noticed awk runs faster than grep for short files and I've yet to try and find where they diverge, but for those sizes it really doesn't matter nearly as much!).
I'd like to understand this so I can make good decisions for future program usage.