I have a fasta file with a sequences (file with text) like:
file.fasta
>seq_1
AGCTAATACTTGTCCACGTTGTACTTCTTCACGAGAAACACCACGTAATAAAGCACCGAT
GTTATCTCCAGCTTCAGCGTAATCTAATAATTTACGGAACATTTCTACACCTGTAACTGT
AGTTTTAGCTGGCTCTTCAGTTAAACCGATGATTTCAACTTCTTCACCAACTTTAACTTG
TCCACGCTCAACACGTCCAGTTGCAACTGTACCACGACCAGTGATTGAGAATACGTCCTC
AACTGGCATCATGAATGGTTTGTCAGAATCACGTTCTGGAGTTGGGATGTACTCATCAAC
TGCGTTCATTAATTCCATGATTTTTTCTTCGTACTCTTCAACGCCTTCTAATGCTTTTAA
AGCAGATCCAGCGATTACAGGTACATCGTCACCAGGGAAGTCATATTCAGATAATAAGTC
ACGAACTTCC
>seq_2
AGCTAATACTTGTCCACGTTGTACTTCTTCACGAGAAACACCACGTAATAAAGCACCGAT
GTTATCTCCAGCTTCAGCGTAATCTAATAATTTACGGAACATTTCTACACCTGTAACTGT
AGTTTTAGATGGCTCTTCAGTTAAACCGATGATTTCAACTTCTTCACCAACTTTAACTTG
TCCACGCTCAACACGTCCAGTTGCAACTGTACCACGACCAGTGATTGAGAATACGTCCTC
AACTGGCATCATGAATGGTTTGTCAGAATCACGTTCTGGAGTTGGGATGTACTCATCAAC
TGCGTTCATTAATTCCATGATTTTATCTTCGTACTCTTCAACGCCTTCTAATGCTTTTAA
AGCAGATCCAGCGATTACAGGTACATCGTCACCAGGGAAGTCATATTCAGATAATAAGTC
ACGAACTTCC
>seq_3
AGCTAATACTTGTCCACGTTGTACTTCTTCACGAGAAACACCACGTAATAAAGCACCGAT
GTTATCTCCAGCTTCAGCGTAATCTAATAATTTACGGAACATTTCTACACCTGTAACTGT
AGTTTTAGATGGCTCTTCAGTTAAACCGATGATTTCAACTTCTTCACCAACTTTAACTTG
TCCACGCTCAACACGTCCAGTTGCAACTGTACCACGACCAGTGATTGAGAATACGTCCTC
AACTGGCATCATGAATGGTTTGTCAGAATCACGTTCTGGAGTTGGGATGTACTCATCAAC
TGCATTCATTAATTCCATGATTTTATCTTCGTACTCTTCAACGCCTTCTAATGCTTTTAA
AGCAGATCCAGCGATTACAGGTACATCGTCACCAGGGAAGTCATATTCAGATAATAAGTC
ACGAACTTCC
............
>seq_n
AGCAGATCCAGCGATTACAGGTACATCGTCACCAGGGAAGTCATATTCAGATAATAAGTC
..............
So I want to calculate the average length of the strings avoiding the lines with >seq_
, my code to obtain the length of each line is:
array_length=$(awk '/^>/ {print n $0; n="\n"}; !/^>/ {printf "%s", $0} END {print ""}' My_file.fasta | awk '!/^>/ {print length(), $0}' | sort -n| awk '{print $1}')
until here everything is ok, I got the fist column that correspond to the length of each string:
echo "$array_length"
203
207
222
231
232
243
255
258
261
268
279
291
307
316
.....
161581
208146
242398
259601
288468
301866
427209
531340
557978
840257
well the length in the array could be variable, in this case I just show part of them.
my problem is that I want to calculate the average of the $array_length (sum of all numbers/length of the array)
A second question is how to take the fist element of the array and the last one; in order to do that, I just add a tail -1
and head -n 1
to the end of the code
awk '/^>/ {print n $0; n="\n"}; !/^>/ {printf "%s", $0} END {print ""}' My_file.fasta | awk '!/^>/ {print length(), $0}' | sort -n| awk '{print $1}' | tail -1
awk '/^>/ {print n $0; n="\n"}; !/^>/ {printf "%s", $0} END {print ""}' My_file.fasta | awk '!/^>/ {print length(), $0}' | sort -n| awk '{print $1}' | head -n 1
I know that, with a file I do it like
cat file.txt | tail -1
cat file.txt | head -n 1
But I dont want to use the same code twice to obtain the $small_one
(203) and $big_one
(840257), I just want to take the fist and last element of the variable $array_length
like the one that I show here, how can I do it?