0

This is the problem: http://rosalind.info/problems/cons/

def file_read(fname):
    with open(fname, "r") as myfile:
        global data
        data = myfile.readlines()
        print(data)
        i = 0
        while i < len(data):
            data[i] = data[i].replace("\n", "")
            if ">" in data[i]:
                data.remove(data[i])
            else:
                i += 1
file_read('rosalind_cons.txt')
res = ["".join(el) for el in zip(*data)]
print(res)
a_str = ""
c_str = ""
g_str = ""
t_str = ""
for x in range(0, len(res)):
    a_str += (str(res[x].count("A"))) + " "
for x in range(0, len(res)):
    c_str += (str(res[x].count("C"))) + " "
for x in range(0, len(res)):
    g_str += (str(res[x].count("G"))) + " "
for x in range(0, len(res)):
    t_str += (str(res[x].count("T"))) + " "
a_str_nospace = a_str.replace(" ", "")
c_str_nospace = c_str.replace(" ", "")
g_str_nospace = g_str.replace(" ", "")
t_str_nospace = t_str.replace(" ", "")
consensus_string = ""
for x in range(0, len(a_str_nospace)):
    if max(a_str_nospace[x], c_str_nospace[x], g_str_nospace[x], t_str_nospace[x]) in a_str_nospace[x]:
        consensus_string += "A"
    elif max(a_str_nospace[x], c_str_nospace[x], g_str_nospace[x], t_str_nospace[x]) in c_str_nospace[x]:
        consensus_string += "C"
    elif max(a_str_nospace[x], c_str_nospace[x], g_str_nospace[x], t_str_nospace[x]) in g_str_nospace[x]:
        consensus_string += "G"
    elif max(a_str_nospace[x], c_str_nospace[x], g_str_nospace[x], t_str_nospace[x]) in t_str_nospace[x]:
        consensus_string += "T"

print(consensus_string)
print("A: " + a_str)
print("C: " + c_str)
print("G: " + g_str)
print("T: " + t_str)

What's wrong with my code? For the sample output it works but for the larger datasets it doesn't. I don't know what is wrong, I think it's the file reading part that's not correct (maybe?)

EDIT: There are some print functions in there but I don't copy them in the answer box so they don't matter in the result

sickleox2
  • 35
  • 6

1 Answers1

0

nice to see a fellow Rosalind user. I discovered that page when I studied Bioinformatics and just stumbled upon it again last month.

To answer your question: You're creating a string of numbers, so that works fine if the numbers are all below 10. Try building a list of integers first and only convert them to a string in the final step.

Christoph
  • 78
  • 5