the fact that you never see jumbled text on the same line or new lines in the middle of a line is a clue that you actually dont need to syncronize appending to the file. the problem is that you use print to write to a single file handle. i suspect print
is actually doing 2 operations to the file handle in one call and those operations are racing between the threads. basically print
is doing something like:
file_handle.write('whatever_text_you_pass_it')
file_handle.write(os.linesep)
and because different threads are doing this simultaneously on the same file handle sometimes one thread will get in the first write and the other thread will then get in its first write and then you'll get two carriage returns in a row. or really any permutation of these.
the simplest way to get around this is to stop using print
and just use write
directly. try something like this:
output.write(f + os.linesep)
this still seems dangerous to me. im not sure what gaurantees you can expect with all the threads using the same file handle object and contending for its internal buffer. personally id side step the whole issue and just have every thread get its own file handle. also note that this works because the default for write buffer flushes is line-buffered, so when it does a flush to the file it ends on an os.linesep
. to force it to use line-buffered send a 1
as the third argument of open
. you can test it out like this:
#!/usr/bin/env python
import os
import sys
import threading
def hello(file_name, message, count):
with open(file_name, 'a', 1) as f:
for i in range(0, count):
f.write(message + os.linesep)
if __name__ == '__main__':
#start a file
with open('some.txt', 'w') as f:
f.write('this is the beginning' + os.linesep)
#make 10 threads write a million lines to the same file at the same time
threads = []
for i in range(0, 10):
threads.append(threading.Thread(target=hello, args=('some.txt', 'hey im thread %d' % i, 1000000)))
threads[-1].start()
for t in threads:
t.join()
#check what the heck the file had
uniq_lines = set()
with open('some.txt', 'r') as f:
for l in f:
uniq_lines.add(l)
for u in uniq_lines:
sys.stdout.write(u)
The output looks like this:
hey im thread 6
hey im thread 7
hey im thread 9
hey im thread 8
hey im thread 3
this is the beginning
hey im thread 5
hey im thread 4
hey im thread 1
hey im thread 0
hey im thread 2