Python - Rosalind Open Reading Frame Problem

Question

There's an Open Reading Frame exercise on Rosalind, for which I get different results from what is obtained in the example task. The exercise description can be found here.

I have this code:

gencode = {"GCT": "A", "GCC": "A", "GCA": "A", "GCG": "A",
           "TGT": "C", "TGC": "C",
           "GAT": "D", "GAC": "D",
           "GAA": "E", "GAG": "E",
           "TTT": "F", "TTC": "F",
           "GGT": "G", "GGC": "G", "GGA": "G", "GGG": "G",
           "CAT": "H", "CAC": "H",
           "ATA": "I", "ATT": "I", "ATC": "I",
           "AAA": "K", "AAG": "K",
           "TTA": "L", "TTG": "L", "CTT": "L", "CTC": "L", "CTA": "L", "CTG": "L",
           "ATG": "M",
           "AAT": "N", "AAC": "N",
           "CCT": "P", "CCC": "P", "CCA": "P", "CCG": "P",
           "CAA": "Q", "CAG": "Q",
           "CGT": "R", "CGC": "R", "CGA": "R", "CGG": "R", "AGA": "R", "AGG": "R",
           "TCT": "S", "TCC": "S", "TCA": "S", "TCG": "S", "AGT": "S", "AGC": "S",
           "ACT": "T", "ACC": "T", "ACA": "T", "ACG": "T",
           "GTT": "V", "GTC": "V", "GTA": "V", "GTG": "V",
           "TGG": "W",
           "TAT": "Y", "TAC": "Y", 
           "TAA": "_", "TAG": "_", "TGA": "_"}

seq = 'AGCCATGTAGCTAACTCAGGTTACATGGGGATGACCCCGCGACTTGGATTAGAGTCTCTTTTGGAATAAGCCTGAATGATCCGAGTAGCATCTCAG'
rev_seq = seq[::-1]


def get_orf_proteins(seq):
    proteins=[]
    for i in range(len(seq)-2):
        if gencode[seq[i:i+3]] == 'M':
            print(i)
            prot = ''
            k = i
            while gencode[seq[k:k+3]] != '_' and k < len(seq)-3:
              prot += gencode[seq[k:k+3]]
              k += 3
            proteins.append(prot)
    return(list(set(proteins)))


print(get_orf_proteins(seq))
print(get_orf_proteins(rev_seq))

Which returns the following protein sequences:

['MGMTPRLGLESLLE', 'MTPRLGLESLLE', 'M', 'MIRVAS']
['MY', 'MSLVSPNKVFSEIRFSAPVGVHWTQSMY']

Am I missing something or rather the example solution is incorrect?

score 0 · Answer 1 · answered Mar 02 '21 at 08:35

0

The reverse complement of a DNA string is not simply the DNA string reversed.

answered Mar 02 '21 at 08:35

Pallie

965
5
10

Oh you are right, thanks. But I still get the 'MIRVAS' sequence from the first sequence, which is not listed in the example solution, even if I corrected the reverse complement. – Benedek Dankó Mar 02 '21 at 15:15
Putting the input sequence into https://web.expasy.org/translate/ seems to suggest the rosalind example output does not match the example input – Pallie Mar 03 '21 at 09:54
Then maybe they put a wrong example output. Anyway, thanks for the help. – Benedek Dankó Mar 04 '21 at 10:03
@Pallie Rosalind is asking that a stop codon terminate the string. – Nosey Apr 14 '22 at 11:43

Python - Rosalind Open Reading Frame Problem

1 Answers1