-1

Write a function named one_frame that takes one argument seq and performs the tasks specified below. The argument seq is to be a string that contains information for the bases of a DNA sequence.

  • a → The function searches given DNA string from left to right in multiples of three nucleotides (in a single reading frame).
  • b → When it hits a start codon ATG it calls get_orf on the slice of the string beginning at that start codon.
  • c → The ORF returned by get_orf is added to a list of ORFs.
  • d → The function skips ahead in the DNA string to the point right after the ORF that we just found and starts looking for the next ORF.
  • e → Steps a through d are repeated until we have traversed the entire DNA string.

The function should return a list of all ORFs it has found.

def one_frame(seq):
    start_codon = 'ATG'
    list_of_codons = []
    y = 0
    while y < len(seq):
        subORF = seq[y:y + 3]
        if start_codon in subORF:
            list_of_codons.append(get_orf(seq))
            return list_of_codons
        else:
            y += 3

one_frame('ATGAGATGAACCATGGGGTAA')
  1. The one_frame at the very bottom is a test case. It is supposed to be equal to ['ATGAGA', 'ATGGGG'], however my code only returns the first item in the list.
  2. How could I fix my function to also return the other part of that list?
Mr. Polywhirl
  • 42,981
  • 12
  • 84
  • 132
  • 1
    Do you really want `get_orf(seq)`? Step b indicates that it should be `get_orf(subORF)` – Barmar Oct 05 '22 at 17:46
  • @Barmar -- because `get_orf` will consume a variable length chunk of the string, and he need to skip past that before searching again. So, he should be doing `y += len(get_orf(seq[y:]))`. – Tim Roberts Oct 05 '22 at 17:51

4 Answers4

0

You have a number of problems in this code, as identified in the comments. I think this does what you are actually supposed to do:

def one_frame(seq):
    start_codon = 'ATG'
    list_of_codons = []
    y = 0
    while y < len(seq):
        if seq[y:y+3] == start_codon:
            orf = get_orf(seq[y:])
            list_of_codons.append(orf)
            y += len(orf)
        else:
            y += 3
    return list_of_codons 
  
one_frame('ATGAGATGAACCATGGGGTAA')
Tim Roberts
  • 48,973
  • 4
  • 21
  • 30
0

You have several problems:

  1. You have return list_of_codons inside the loop. So you return as soon as you find the first match and only return that one. Put that at the end of the function, not inside the loop.

  2. You have y += 3 in the else: block. So you won't increment y when you find a matching codon, and you'll be stuck in a loop.

  3. You need to call get_orf() on the slice of the string starting at y, not the whole string (task b).

  4. Task d says you have to skip to the point after the ORF that was returned in task b, not just continue at the next codon.

def one_frame(seq):
    start_codon = 'ATG'
    list_of_orfs = []
    y = 0
    while y < len(seq):
        subORF = seq[y:y + 3]
        if start_codon = subORF:
            orf = get_orf(seq[y:])
            list_of_orfs.append(orf)
            y += len(orf)
        else:
            y += 3

    return list_of_orfs

one_frame('ATGAGATGAACCATGGGGTAA')
Barmar
  • 741,623
  • 53
  • 500
  • 612
0

Try splitting seq into codons instead:

def one_frame(seq):                                                                                     
    shift = 3                                                                                           
    codons = [seq[i:i+shift] for i in range(0, len(seq), shift)]                                                
                                                                                                        
    start_codon = "ATG"                                                                                 
    orf_list = []                                                                                       
    for codon in codons:                                                                                
        if codon == start_codon:                                                                        
            orf_list += [get_orf(codon)]                                                                
                                                                                                        
    return orf_list                                                                                     
                                                                                                        
                                                                                                        
seq = 'ATGAGATGAACCATGGGGTAA'                                                                           
one_frame(seq)  
JustLearning
  • 1,435
  • 1
  • 1
  • 9
0

Slightly different approach but as I know nothing about DNA sequencing this may not make sense. Here goes anyway:

def one_frame(seq):
    start_codon = 'ATG'
    list_of_codons = []
    offset = 0

    while (i := seq[offset:].find(start_codon)) >= 0:
        offset += i
        list_of_codons.append(get_orf(seq[offset:]))
        offset += len(list_of_codons[-1])
    
    return list_of_codons

In this way the find() starts searching from the beginning of the sequence initially but subsequently only from the end of any previous codon

DarkKnight
  • 19,739
  • 3
  • 6
  • 22