3

I have to make a function that prints the longest palindrome substring of a piece of DNA. I already wrote a function that checks whether a piece of DNA is a palindrome itself. See the function below.

def make_complement_strand(DNA):
    complement=[]
    rules_for_complement={"A":"T","T":"A","C":"G","G":"C"}
    for letter in DNA:
        complement.append(rules_for_complement[letter])
    return(complement)

def is_this_a_palindrome(DNA): 
        DNA=list(DNA)
        if DNA!=(make_complement_strand(DNA)[::-1]):     
            print("false")                  
            return False
        else:                             
            print("true")
            return True

is_this_a_palindrome("GGGCCC") 

But now: how to make a function printing the longest palindrome substring of a DNA string?

The meaning of palindrome in the context of genetics is slightly different from the definition used for words and sentences. Since a double helix is formed by two paired strands of nucleotides that run in opposite directions in the 5’- to-3’ sense, and the nucleotides always pair in the same way (Adenine (A) with Thymine (T) for DNA, with Uracil (U) for RNA; Cytosine (C) with Guanine (G)), a (single-stranded) nucleotide sequence is said to be a palindrome if it is equal to its reverse complement. For example, the DNA sequence ACCTAGGT is palindromic because its nucleotide-by-nucleotide complement is TGGATCCA, and reversing the order of the nucleotides in the complement gives the original sequence.

Lynn
  • 35
  • 6
  • Explain your definition of being palindrome more clear. By the way, you are setting your `DNA` parameter to an empty list at the beginning of your `is_this_a_palindrome()` function?? – muyustan Apr 25 '20 at 13:36
  • 1
    You only need to check half the string against the back half reversed – stark Apr 25 '20 at 13:46
  • @muyustan, I added a definition! Is it more clear now? – Lynn Apr 25 '20 at 14:26
  • @stark, I understand what you mean, but I am a python newbie and don't know how to do that. – Lynn Apr 25 '20 at 14:32
  • much better, now, anybody can approach the problem without having information on DNA sequences :) – muyustan Apr 25 '20 at 14:36
  • by the way, your `is_palindrome()` function returns false for "GGGCCC" – muyustan Apr 25 '20 at 14:40
  • probably because you input `DNA` as a *string* but `make_complement` function returns a list of characters. You should try to `str.join()` the list – muyustan Apr 25 '20 at 14:42
  • @muyustan, you are right, now it returns true – Lynn Apr 25 '20 at 14:47

1 Answers1

1

Here, this should be decent starting point for getting longest palindrome substring.

def make_complement_strand(DNA):
    complement=[]
    rules_for_complement={"A":"T","T":"A","C":"G","G":"C"}
    for letter in DNA:
        complement.append(rules_for_complement[letter])
    return(complement)

def is_this_a_palindrome(DNA): 
        DNA=list(DNA)
        if DNA!=(make_complement_strand(DNA)[::-1]):     
            #print("false")                  
            return False
        else:                             
            #print("true")
            return True


def longest_palindrome_ss(org_dna, palindrone_func):
    '''
    Naive implementation-

    We start with 2 pointers.
    i starts at start of current subsqeunce and j starts from i+1 to end
    increment i with every loop

    Uses palindrome function provided by user

    Further improvements- 
    1. Start with longest sequence instead of starting with smallest. i.e. start with i=0 and j=final_i and decrement.
    '''
    longest_palin=""
    i=j=0
    last_i=len(org_dna)
    while i < last_i:
        j=i+1
        while j < last_i:
            current_subsequence = org_dna[i:j+1]
            if palindrone_func(current_subsequence):
                if len(current_subsequence)>len(longest_palin):
                    longest_palin=current_subsequence
            j+=1
        i+=1
    print(org_dna, longest_palin)
    return longest_palin


longest_palindrome_ss("GGGCCC", is_this_a_palindrome)
longest_palindrome_ss("GAGCTT", is_this_a_palindrome)
longest_palindrome_ss("GGAATTCGA", is_this_a_palindrome)

Here are some executions -

mahorir@mahorir-Vostro-3446:~/Desktop$ python3 dna_paln.py 
GGGCCC GGGCCC
GAGCTT AGCT
GGAATTCGA GAATTC
mahoriR
  • 4,377
  • 3
  • 18
  • 27