I am trying to write a large amount of data to a numpy memmap, and trying to speed it up using multiprocessing. Here is a minimal example of what I'm trying to do.
unProcessedData = np.memmap( 'file.memmap', dtype=np.uint16, mode='w+', shape=( 2500000 , 512))
numpyMemmap = np.memmap( 'file.memmap', dtype=np.uint16, mode='w+', shape=( 2500000 , 512))
for i in range(2500000):
numpyMemmap[i] = unProcessedData[i] +1
I've been trying to find the best method to parallelize this. I heard you have to be careful, since overhead in spawning an additional process, compared to a new thread, or even just staying single-threaded and utilizing an event loop to trigger actions.
So I'm writing to a memmap disk, I heard that multithreading are better than multi-processing, but I can't find a definite answer.
It seems that joblib may be particularly suited for this,
https://joblib.readthedocs.io/en/latest/auto_examples/parallel_memmap.html
But I can't find an exact example for what I'm trying to do.