0

I am trying to solve the question from here http://rosalind.info/problems/cons/

My script fills the counter lists and outputs a consensus string of equal length. I don't think there are math or index errors going on and have run into a wall. My code:

 with open('C:/users/steph/downloads/rosalind_cons (3).txt') as f:
    seqs = f.read().splitlines()

#remove all objects that are not sequences of interest
for s in seqs:
    if s[0] == '>':
        seqs.remove(s)

n = range(len(seqs[0])+1)

#lists to store counts for each nucleotide
A, C, G, T = [0 for i in n], [0 for i in n], [0 for i in n], [0 for i in n]

#see what nucleotide is at each index and augment the 
#same index of the respective list
def counter(Q):
    for q in Q:
        for k in range(len(q)):
            if q[k] == 'A':
                A[k] += 1
            elif q[k] == 'C':
                C[k] += 1
            elif q[k] == 'G':
                G[k] += 1
            elif q[k] == 'T':
                T[k] += 1
counter(seqs)

#find the max of all the counter lists at every index 
#and add the respective nucleotide to the consensus sequence
def consensus(a,t,c,g):
        consensus = ''
        for k in range(len(a)):
            if (a[k] > t[k]) and (a[k]>c[k]) and (a[k]>g[k]):
                consensus = consensus+"A"
            elif (t[k] > a[k]) and (t[k]>c[k]) and (t[k]>g[k]):
                consensus = consensus+ 'T'
            elif (c[k] > t[k]) and (c[k]>a[k]) and (c[k]>g[k]):
                consensus = consensus+ 'C'
            elif (g[k] > t[k]) and (g[k]>c[k]) and (g[k]>a[k]):
                consensus = consensus+ 'G'
            #ensure a nucleotide is added to consensus sequence
            #when more than one index has the max value
            else:
                if max(a[k],c[k],t[k],g[k]) in a:
                    consensus = consensus + 'A'
                elif max(a[k],c[k],t[k],g[k]) in c:
                    consensus = consensus + 'C'
                elif max(a[k],c[k],t[k],g[k]) in t:
                    consensus = consensus + 'T'
                elif max(a[k],c[k],t[k],g[k]) in g:
                    consensus = consensus + 'G'
        print(consensus)
        #debugging, ignore this --> print('len(consensus)',len(consensus))
consensus(A,T,C,G)

#debugging, ignore this --> print('len(A)',len(A))

print('A: ',*A, sep=' ')
print('C: ',*C, sep=' ')
print('G: ',*G, sep=' ')
print('T: ',*T, sep=' ')

Thank you for your time

Sank Finatra
  • 334
  • 2
  • 10

1 Answers1

0
  • There is a mistake in the following line:

    n = range(len(seqs[0])+1)

which results in a sequence which is too long (filled with an extra A and 4 times 0). Remove +1 and it should work.

  • In addition you have two spaces in your output, remove the space after : in your print statements.
  • If you fix those two lines, it will work for the example but will fail for sequences longer than one line (like in it the real example).

Try merging the lines with something like the snipped below:

new_seqs = list()
for s in seqs:
    if s.startswith('>'):
        new_seqs.append('')
    else:
        new_seqs[-1]+=s
seqs = new_seqs

and try it again.

Maximilian Peters
  • 30,348
  • 12
  • 86
  • 99
  • These are good suggestions but unfortunately I still get an incorrect answer. After browsing the Rosalind community's ideas I think the problem is some error formatting the output or hidden newlines. – Sank Finatra Nov 10 '16 at 20:56
  • @SankFinatra: your formatting is fine, no hidden newlines, I updated the answer accordingly. – Maximilian Peters Nov 10 '16 at 23:00
  • @Maxmilian Peters: I now see I was incorrectly building my 'seqs' lists. I implemented your suggested changes but still am getting incorrect answers for some reason – Sank Finatra Nov 11 '16 at 21:24
  • that's weird, I compared your results with the modified with a working solution and the results were identical. Make sure there are no additional spaces or copy&paste artifacts, Rosalind is quite picky about those things. – Maximilian Peters Nov 11 '16 at 21:31