0

I have been trying to reverse complement a fasta DNA sequence. Here is my code:

fastafile=open('sequence (3).fasta','r')
entries=[]
reverse=""
sequence=['A','T','G','C','N']
for line in fastafile:
    if not line.startswith('>'):
        line = line.split()
        entries.append(line)
print entries
for index in range(0,len(entries[::-1])):
    if index !=sequence:
        print "this is not a valid nucleotide"
        break
    else:
        if index=='A':
            reverse+='T'
        elif index=='T':
            reverse+='A'
        elif index=='C':
            reverse+='G'
        elif index=='G':
            reverse+ 'C'
        elif index=='N':
            reverse+='N'
print reverse

And each time I get the output, this is not a valid nucleotide, even when my print of entries show that it has the items in sequence. Here is a sample of the output when I print enteries;

[['GCTCCCCTGAGGTTCGGCACCCACACTCCCTTCCCAGGAGCTCGCGATGCAAGAGCCACAGTCAGAGCTC'], ['AATATCGACCCCCCTCTGAGCCAGGAGACATTTTCAGAATTGTGGAACCTGCTTCCTGAAAACAATGTTC'], ['TGTCTTCGGAGCTGTGCCCAGCAGTGGATGAGCTGCTGCTCCCAGAGAGCGTCGTGAACTGGCTAGACGA']

How can I override this problem? I just want to add that I only started seriously programming with python about 2 months ago so I am still learning and improving. Thank you!

Carles Mitjans
  • 4,786
  • 3
  • 19
  • 38
B_bunny
  • 35
  • 1
  • 6

1 Answers1

1

Your loop statement is:

for index in range(0,len(entries[::-1])):

This will iterate over the length of entries, that is, 0, 1, 2, 3, ..., len(entries).

When you do if index != sequence you are actually comparing an integer with a list, say if 3 != ['A', 'C', 'T', 'G']. I assume you can see that makes no sense. What you probably want to do is see if the nucleotide in the sequence is a valid nucleotide, ergo it is in the sequence list. You can do it like this:

if entries[::-1][index] in sequence # Will be true if the nucleotide at entries[::-1][index] is inside sequence

Let me say two things:

  • First one, you don't have to range to len(entries[::-1]), it is the same as len(entries)

  • Secondly, and more important, there is an actual module build specially for bioinformatics. It is called Biopython. It has special objects and functions. For example, your problem could be resolved as follows:

-

from Bio.Seq import Seq

dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
print dna.reverse_complement()

Output: CTATCGGGCACCCTTTCAGCGGCCCATTACAATGGCCAT

Carles Mitjans
  • 4,786
  • 3
  • 19
  • 38
  • Thank you for your assistance. I am not as familiar with majority of the biopython modules but seq requires I input a string of nucleotide sequence where as in my case I want to use a fasta sequence from a file as an input. – B_bunny Feb 20 '17 at 04:33