I have a list dataframe_chunk
which contains chunks of a very large pandas dataframe.I would like to write every single chunk into a different csv, and to do so in parallel. However, I see the files being written sequentially and I'm not sure why this is the case. Here's the code:
import concurrent.futures as cfu
def write_chunk_to_file(chunk, fpath):
chunk.to_csv(fpath, sep=',', header=False, index=False)
pool = cfu.ThreadPoolExecutor(N_CORES)
futures = []
for i in range(N_CORES):
fpath = '/path_to_files_'+str(i)+'.csv'
futures.append(pool.submit( write_chunk_to_file(dataframe_chunk[i], fpath)))
for f in cfu.as_completed(futures):
print("finished at ",time.time())
Any clues?