I have an input file with containing a list of strings.
I am iterating through every fourth line starting on line two.
From each of these lines I make a new string from the first and last 6 characters and put this in an output file only if that new string is unique.
The code I wrote to do this works, but I am working with very large deep sequencing files, and has been running for a day and has not made much progress. So I'm looking for any suggestions to make this much faster if possible. Thanks.
def method():
target = open(output_file, 'w')
with open(input_file, 'r') as f:
lineCharsList = []
for line in f:
#Make string from first and last 6 characters of a line
lineChars = line[0:6]+line[145:151]
if not (lineChars in lineCharsList):
lineCharsList.append(lineChars)
target.write(lineChars + '\n') #If string is unique, write to output file
for skip in range(3): #Used to step through four lines at a time
try:
check = line #Check for additional lines in file
next(f)
except StopIteration:
break
target.close()