0

I have a set of large tiff images (3200 x 3200). I want to load them in jupyter notebook, compute averages and create a numpy array. To make it concrete, each movie clip is composed of 10 tiff images and there are several movie clips. I want to compute the average of 10 frames for each movie, and return an array of size (# movie clips x 3200 x 3200).

from tifffile import tifffile

def uselibtiff(f):
    tif = tifffile.imread(f) # open tiff file in read mode
    return tif

def avgFrames(imgfile_Movie,im_shape): 
# calculating the sum of all frames of a movie. imgfile_Movie is a list of tiff image files for a particular movie
    im = np.zeros(im_shape)
    for img_frame in imgfile_Movie:
        frame_im = uselibtiff(img_frame)
        im += frame_im
    
    return im 

# imgfiles_byMovie is a 2D list

im_avg_all = np.zeros((num_movies,im_shape[0],im_shape[1]))
for n,imgfile_Movie in enumerate(imgfiles_byMovie):
    print(n)
    #start = timeit.default_timer()
    results = avgFrames(imgfile_Movie,im_shape)/num_frames
    #end = timeit.default_timer()
    
    #print(end-start)
    im_avg_all[n] = results

so far I've noticed that avgFrames takes about 3 seconds (for 10 frames per movie). I wanted to speed things up and tried using multiprocessing in python.

import multiprocessing as mp
print("Number of processors: ", mp.cpu_count())

pool = mp.Pool(mp.cpu_count())

results = [pool.apply(avgFrames, args=(imgfile_Movie, im_shape)) for imgfile_Movie in imgfiles_byMovie]

pool.close()

Unfortunately, the above code runs very slow and I have to terminate it without seeing the results. What am I doing wrong here?

  • 2
    `multiprocessing` adds a bunch of IPC overhead to serialize/deserialize data to move it between processes. It's 100% normal for it to make things slower if you don't put a lot of time/effort/thought into using it only in cases where that overhead is a nonissue. – Charles Duffy Dec 16 '21 at 21:33
  • 2
    Anyhow -- if you're calling into a C library where the bindings release the GIL while they work, you don't _need_ multiprocessing -- regular threading will work just fine, and with no reason for any serialization overhead since all your threads are already in the same memory space as the main program. (Whether the specific libtiff bindings you're using do that is a point on which you'll want to investigate their implementation). – Charles Duffy Dec 16 '21 at 21:35

0 Answers0