Using loops, how can I write a function in python, to sort the longest chain of proteins, regardless of order. The function returns a substring that consists only of the character 'A','C','G', and 'T' when ties are mixed up with other elements: Example, in the sequence: 'ACCGXXCXXGTTACTGGGCXTTGT', it returns 'GTTACTGGGC'
Asked
Active
Viewed 2,193 times
1
-
just a spelling correction. – Phani Oct 01 '15 at 20:08
-
1Clarifying the concepts: Usually A, C, T, G are called bases or nucleotides, not proteins. Some DNA sequences translate to amino-acid chains; these are proteins. DNA sequencing is the process of finding out what string of nucleotides a DNA sequence consists of. – asjo Oct 01 '15 at 20:15
-
The molecules are not in discussion at this point because am trying to come up with a code able to sort them. – HandyFrank Oct 02 '15 at 06:53
1 Answers
1
If the data is provided as a string you could simply split it by the character 'X' and thereby get a list.
startstring = 'ACCGXXCXXGTTACTGGGCXTTGT'
array = startstring.split('X')
Then looping over the list while checking for the length of the element would give you the right result:
# Initialize placeholders for comparison
temp_max_string = ''
temp_max_length = 0
#Loop over each string in the list
for i in array:
# Check if the current substring is longer than the longest found so far
if len(i) > temp_max_length:
# Replace the placeholders if it is longer
temp_max_length = len(i)
temp_max_string = i
print(temp_max_string) # or 'print temp_max_string' if you are using python2.
You could also use the python built-ins to get your result in a more efficient manner:
Sorting by descending length (list.sort()
)
startstring = 'ACCGXXCXXGTTACTGGGCXTTGT'
array = startstring.split('X')
array.sort(key=len, reverse=True)
print(array[0]) #print the longest since we sorted for descending lengths
print(len(array[0])) # Would give you the length of the longest substring
Only get the longest substring (max()
):
startstring = 'ACCGXXCXXGTTACTGGGCXTTGT'
array = startstring.split('X')
longest = max(array, key=len)
print(longest) # gives the longest substring
print(len(longest)) # gives you the length of the longest substring

MSeifert
- 145,886
- 38
- 333
- 352