I am new to python and I am trying to figure out how to read a fasta file with multiple sequences and then create a new fasta file containing the reverse compliment of the sequences. The file will look something like:
>homo_sapiens ACGTCAGTACGTACGTCATGACGTACGTACTGACTGACTGACTGACGTACTGACTGACTGACGTACGTACGTACGTACGTACGTACTG
>Canis_lupus CAGTCATGCATGCATGCAGTCATGACGTCAGTCAGTACTGCATGCATGCATGCATGCATGACTGCAGTACTGACGTACTGACGTCATGCATGCAGTCATG
>Pan_troglodytus CATGCATACTGCATGCATGCATCATGCATGCATGCATGCATGCATGCATCATGACTGCAGTCATGCAGTCAGTCATGCATGCATCAT
I am trying to learn how to use for and while loops so if the solution can incorporate one of them it would be preferred.
So far I managed to do it in a very unelegant manner as follows:
file1 = open('/path/to/file', 'r')
for line in file1: if line[0] == '>': print line.strip() #to capture the title line else: import re seq = line.strip() line = re.sub(r'T', r'P', seq) seq = line line = re.sub(r'A',r'T', seq) seq = line line = re.sub(r'G', r'R', seq) seq = line line = re.sub(r'C', r'G', seq) seq = line line = re.sub(r'P', r'A', seq) seq = line line = re.sub(r'R', r'C', seq) print line[::-1]
file1.close()
This worked but I know there is a better way to iterate through that end part. Any better solutions?