The sample file looks like this (all on one line, wrapped for legibility):
['>1\n', 'TCCGGGGGTATC\n', '>2\n', 'TCCGTGGGTATC\n',
'>3\n', 'TCCGTGGGTATC\n', '>4\n', 'TCCGGGGGTATC\n',
'>5\n', 'TCCGTGGGTATC\n', '>6\n', 'TCCGTGGGTATC\n',
'>7\n', 'TCCGTGGGTATC\n', '>8\n', 'TCCGGGGGTATC\n','\n',
'$$$\n', '\n',
'>B1\n', 'ATCGGGGGTATT\n', '>B2\n', 'TT-GTGGGAATC\n',
'>3\n', 'TTCGTGGGAATC\n', '>B4\n', 'TT-GTGGGTATC\n',
'>B5\n', 'TTCGTGGGTATT\n', '>B6\n','TTCGGGGGTATC\n',
'>B7\n', 'TT-GTGGGTATC\n', '>B8\n', 'TTCGGGGGAATC\n',
'>B9\n', 'TTCGGGGGTATC\n','>B10\n', 'TTCGGGGGTATC\n',
'>B42\n', 'TT-GTGGGTATC\n']
The $$$
separates the two sets. I need to use .strip
function and remove the \n
and all the "headers".
I need to make a list of lists (as below) and replace "-" with Z (again, all on one line; wrapped here for legibility):
[['TCCGGGGGTATC','TCCGTGGGTATC','TCCGTGGGTATC', 'TCCGGGGGTATC',
'TCCGTGGGTATC',CGTGGGTATC','TCCGTGGGTATC', 'TCCGGGGGTATC'],
['ATCGGGGGTATT', 'TT-GTGGGAATC','TTCGTGGGAATC', 'TT-GTGGGTATC',
'TTCGTGGGTATT', 'TTCGGGGGTATC','TT-GTGGGTATC', 'TTCGGGGGAATC',
'TTCGGGGGTATC', 'TTCGGGGGTATC','TT-GTGGGTATC]]