I have written this code
import sys
file = open(sys.argv[1], 'r')
string = ''
for line in file:
if line.startswith(">"):
pass
else:
string = string + line.strip()
#print (list(string))
w = input("Please enter window size:")
test = [string[i:i+w] for i in range (0,len(string),w)]
seq = input("Please enter the number of sequences you wish to read:")
#print (test[0:seq])
It generates a list which looks like this-
['TAAAACACCC', 'TCAATTCAAG', 'GGTTTTTGAG', 'CGAGCTTTTT', 'ACTCAAAGAA', 'TCCAAGATAG', 'CGTTTAAAAA', 'TTTAGGGGTG', 'TTAGGCTCAG', 'CATAGAGTTT']
Now the next step is to read the occurance of the letters GC
(or can be CG
) in each element of the list. Is there a way to loop through the list in such a way that the output file looks like:
Segment 1- The %GC is <the calculated number>
Segment 2- The %GC is <the calculated number>
Segment 3- The %GC is <the calculated number>
Since the file is wayy to large and the number of segments (each individual element of the list like 'TAAGATATA'
) i will be getting will be huge i do not know how to get the number (1,2,3...) of the segment in the output file. Also since I am new to python (and programming) I not very good at using functions very well.