I am attempting to develop a nested for loop which will run an awk command with two sets of integers over a .txt file. The loop(s) should do the following:
- For lines that are equal to or larger than a given length i and a given percentage j, print column 2
- Count the number of unique lines
- Output count to file
I've created this:
lens=(1000 2000 3000 5000)
percent=(80 90)
for i in ${lens[@]}
do
for j in ${percent[@]}
do
echo "Length is $i and percent is $j"
echo "Where length =>$i and % ID is >=$j, number of matches is: " >> output.txt
awk '{if ($4>=$i && $3>=$j) print $2}' input.txt | uniq | wc -l >> output.txt
awk '{if ($4>=1000 && $3>=80) print $2}' input.txt | uniq | wc -l >> output.txt
done
done
For some reason, using the variables i and j cause my outputs to always be 0, (even when they shouldn't be) - for example, the output from the second awk command returns the correct value, even though ostensibly the two lines should be equivalent during the first iteration of the loop. See the beginning of the output file:
Where length =>1000 and % ID is >=80, number of matches is:
0
775
The sense-check echo "Length is $i and percent is $j"
prints normal output: Length is 1000 and percent is 80
. Same for the second echo "Where length>=$i..."
so I'm really stumped. Why should the presence of the array variables be causing problems in awk?
EDIT: Well, as usual, the answer was painfully simple and came down to a few ''. The proper code is below; note the shell variables $i and $j have been surrounded with '':
for i in ${lens[@]}
do
for j in ${percent[@]}
do
echo "Length is $i and percent is $j"
echo "Where length =>$i and % ID is >=$j, number of matches is: " >> output.txt
awk '{if ($4>=$'i' && $3>=$'j') print $2}' input.txt | uniq | wc -l >> output.txt
done
done