0

I am attempting to develop a nested for loop which will run an awk command with two sets of integers over a .txt file. The loop(s) should do the following:

  1. For lines that are equal to or larger than a given length i and a given percentage j, print column 2
  2. Count the number of unique lines
  3. Output count to file

I've created this:

lens=(1000 2000 3000 5000)
percent=(80 90)
for i in ${lens[@]}
do
    for j in ${percent[@]}
    do
        echo "Length is $i and percent is $j"
        echo "Where length =>$i and % ID is >=$j, number of matches is: " >> output.txt
        awk '{if ($4>=$i && $3>=$j) print $2}' input.txt | uniq | wc -l >> output.txt
        awk '{if ($4>=1000 && $3>=80) print $2}' input.txt | uniq | wc -l >> output.txt
    done
done

For some reason, using the variables i and j cause my outputs to always be 0, (even when they shouldn't be) - for example, the output from the second awk command returns the correct value, even though ostensibly the two lines should be equivalent during the first iteration of the loop. See the beginning of the output file:

Where length =>1000 and % ID is >=80, number of matches is: 
0
775

The sense-check echo "Length is $i and percent is $j" prints normal output: Length is 1000 and percent is 80. Same for the second echo "Where length>=$i..." so I'm really stumped. Why should the presence of the array variables be causing problems in awk?

EDIT: Well, as usual, the answer was painfully simple and came down to a few ''. The proper code is below; note the shell variables $i and $j have been surrounded with '':

for i in ${lens[@]}
do
    for j in ${percent[@]}
    do
        echo "Length is $i and percent is $j"
        echo "Where length =>$i and % ID is >=$j, number of matches is: " >> output.txt
        awk '{if ($4>=$'i' && $3>=$'j') print $2}' input.txt | uniq | wc -l >> output.txt
    done
done
clinaaeus
  • 11
  • 2
  • Variables aren't expanded inside single quotes, so `awk` isn't getting the `$i` and `$j` values. – Barmar Jul 29 '21 at 04:12
  • Mind, it's a _very_ bad idea to call awk in a tight inner loop like this. Awk is much faster than bash once it's running, but starting up a new copy over and over takes a lot of time -- it's generally better to loop over your values _in awk itself_, since awk (and especially gawk) has enough file I/O to do everything needed (reading multiple input files and writing to multiple output files from just one invocation), not to mention native array support. – Charles Duffy Jul 29 '21 at 05:08

0 Answers0