I have multiple datasets in Phylip format (specified below) that I would like to convert to Fasta(specified below) using this python code:
for j in range(1, 10):
inFile = open('/path/to/input_sequence/seqfile_00' +str(j) + '.txt', 'r')
outFile = open('/path/to/output_sequence/Fasta/seqfile_00' + str(j) +'.txt', 'w')
inLines = inFile.readlines()
inFile.close()
outLines = inLines[1:17]
for line in outLines:
if line.startswith('\n'):
line = line.replace('\n','')
outFile.write(line.replace(' ',' \n').replace('sequence', '>sequence'))
outFile.close()
This is what my Phylip (input_sequences) look like:
8 1500\n
\n
sequence1 CTGTCCTTG...\n
\n
sequence2 CTGTCGTTG...\n
\n
sequence3 CTGCGTATG...\n
\n
sequence4 CTATGCCTG...\n
\n
sequence5 AGGTGTAAG...\n
\n
sequence6 AGGTGTAAG...\n
\n
sequence7 AAATTCAAA...\n
\n
sequence8 AAGTCCAAA...\n
\n
And this is what I would like my output_sequences (in Fasta format) to look like:
>sequence1 \n
CTGTCCTTGG...\n
>sequence2 \n
CTGTCGTTGG...\n
>sequence3 \n
CTGCGTATGG...\n
>sequence4 \n
CTATGCCTGG...\n
>sequence5 \n
AGGTGTAAGG...\n
>sequence6 \n
AGGTGTAAGA...\n
>sequence7 \n
AAATTCAAAG...\n
>sequence8 \n
AAGTCCAAAA...\n
When I run the above code, I get the correct output for j = 1 but the following j's (2:9) I get this output
\n
>sequence1 *red inverted question mark*CTGTCCTTGG...\n
>sequence2 *red inverted question mark*CTGTCGTTGG...\n
>sequence3 *red inverted question mark*CTGCGTATGG...\n
>sequence4 *red inverted question mark*CTATGCCTGG...\n
>sequence5 *red inverted question mark*AGGTGTAAGG...\n
>sequence6 *red inverted question mark*AGGTGTAAGA...\n
>sequence7 *red inverted question mark*AAATTCAAAG...\n
>sequence8 *red inverted question mark*AAGTCCAAAA...\n
(... is the continued sequence and red inverted question mark is what I see when I show invisibles in text wrangler).
I guess the general question, and why I am confused, is why/how the code can work fine for j =1 but not the rest of the numbers? And how to solve this issue?
Thanks in advance!