0

I am trying to write a large amount of data to a numpy memmap, and trying to speed it up using multiprocessing. Here is a minimal example of what I'm trying to do.

unProcessedData = np.memmap( 'file.memmap', dtype=np.uint16, mode='w+', shape=( 2500000 , 512))
numpyMemmap  = np.memmap( 'file.memmap', dtype=np.uint16, mode='w+', shape=( 2500000 , 512))

for i in range(2500000):
    numpyMemmap[i] = unProcessedData[i] +1 

I've been trying to find the best method to parallelize this. I heard you have to be careful, since overhead in spawning an additional process, compared to a new thread, or even just staying single-threaded and utilizing an event loop to trigger actions.

So I'm writing to a memmap disk, I heard that multithreading are better than multi-processing, but I can't find a definite answer.

It seems that joblib may be particularly suited for this,

https://joblib.readthedocs.io/en/latest/auto_examples/parallel_memmap.html

But I can't find an exact example for what I'm trying to do.

SantoshGupta7
  • 5,607
  • 14
  • 58
  • 116

0 Answers0