I am using a column in one file to look up values in another file. The second file is very large and I would like to find all the values in a single pass with awk. I have tried doing this with an associative array, but am stumped how to get the out put i want. I want to take F1, use $2 to look up values in F2, and get the output I show below, which is $0 from F1 as the header, followed by $10 from F2 sorted and counted for each unique string ( ie pipped through sort | uniq -c).
F1
+ID=dnaK.p01 12121 TTGGGCAGTTGAAACCAGACGTTTCGCCCCTATTACAGAC[T]CACAACCACATGATGACCG
F2
solid309_20110930_FRAG_BC_bcSample12273_1541_657_F3 0 NC_012759 12121 42 35M * 0 0 ACACAACCACATGATGACCGAATATATAGTGGCTC BBBBBBA@BBBAB@?B@BBBB<5BBBAA@:>>&B7
solid309_20110930_FRAG_BC_bcSample12295_323_1714_F3 0 NC_012759 12121 42 35M * 0 0 ACACAACCACATGATGACCGAATATATAGTGGAGA BB@@A@@A@@@?@<=?@@=><6*7=?9993>4&7,
solid309_20110930_FRAG_BC_bcSample12325_1148_609_F3 0 NC_012759 12121 42 35M * 0 0 ACACAACCACATGATGACCGAATATATAGTGGAGA BBBB@B@?@B@@A@??BBBA@<.<==:6:1>9(<-
solid309_20110930_FRAG_BC_bcSample11796_1531_1170_F3 0 NC_012759 12122 42 35M * 0 0 CACAACCACATGATGACCGAATATATAGTGGAGCA '&&+&&)&')&0(.,',(.3+&&&+,&&&&&&&&&
solid309_20110930_FRAG_BC_bcSample12110_1166_1149_F3 0 NC_012759 12122 42 35M * 0 0 CACAACCACATGATGACCGAATATATAGTGGAGAC -(:18)538;,9277*'8:<)&,0-+)//3&'1+'
solid309_20110930_FRAG_BC_bcSample183_686_962_F3 0 NC_012759 12123 42 35M * 0 0 ACAACCACATGATGACCGAATATATAGTGGAGTGC BB?BBBB;BBBBBB@ABB;@7AA@@A@*>?+B8@9
I am doing this with the following script
for line in `awk '{if ($1~"-") print ($2-34);else print $2}' $1`
do
awk -v l=$line '{if ($1~"-") l=l+34;if ($2==l) print }' $1 >> f2
awk -v l=$line '{if ($4==l) print $10}' URA2.sam | sort | uniq -c |awk '{if ($1>15) print}'>> f2
done
Which requires multiple passes with awk for each line in. I was thinking I could use an associative array made from F1 to do this with one pass. F2 is sorted by $4. I used the following script to try to get the output I wanted.
awk 'FNR==NR{a[$2]=$0;next}$4 in a{print $10}' f1 f2 | sort | uniq -c