It seems like you have taken some code and tried to use it without at all understanding what it does. If you read the linked question, you'll notice that the poster in that question had a dictionary of amino acid code strings separated by |
. The call to split
was to extract the second part of each code string, e.g. from "F|Phe"
you want to get "Phe"
, and that's why that poster needed the split
. You don't have those sorts of strings so you shouldn't be using that part of the code.
I will second joaquin's recommendation to use BioPython, as it's clearly the right tool for the job, but for learning purposes: the first thing you need to know is that you have four tasks to accomplish:
- Compute the reverse complement of the DNA base sequence
- Break the reverse complementary sequence into groups of 3 bases
- Convert each group into an amino acid code
- Put the amino acid codes together into a string
The code in the linked answer doesn't handle the first step. For that you can use the translate
method of Python string objects. First you use maketrans
to produce a translation dictionary that will map key => value,
basecomplement = str.maketrans({'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'})
and then you can write a method to produce the reverse complement,
def reverse_complement(seq):
return seq.translate(basecomplement)[::-1]
The translate
method of joaquin's answer on the other question implements steps 2 and 3. It can actually be done more efficiently using the grouper
recipe from itertools
. First you will need a dictionary mapping base triplets to amino acids,
amino_acids = {'TAT': 'Tyr', ...}
and you can then use this to convert any sequence of bases,
amino_acids[''.join(a)] for a in zip(*([iter(rseq)]*3))
By way of explanation, zip(*([iter(rseq)]*3))
groups the characters three at a time. But it does so as tuples, not strings, e.g. for 'TATATA'
you'd get ('T', 'A', 'T'), ('A', 'T', 'A')
, so you need to join each tuple to make a string. That's what ''.join(a)
does. Then you look up the string in the amino acid table, which is done by amino_acids[...]
.
Finally you need to join all the resulting amino acid codes together, which can be done by an outer ''.join(...)
. So you could define a method like this:
def to_amino_acids(seq):
return ''.join(amino_acids[''.join(a)] for a in zip(*([iter(rseq)]*3)))
Note that you don't need .split('|')
unless your amino_acids
dictionary contains multiple representations separated by |
.
Finally, to do this for the three different possible ways of converting the bases to amino acids, i.e. the three frames, you would use something akin to the final loop in joaquin's answer,
rseq = reverse_complement(seq)
for frame in range(3):
# print the frame number
print('+', frame+1, end=' ')
# translate the base sequence to amino acids and print it
print(to_amino_acids(rseq[frame:]))
Note that this loop runs three times, to print the three different frames. There's no point in having a loop if you were just going to have it run once.