1

I'm a bit lost between joblib, multiprocessing, etc..

What's the most effective way to parallelize a for loop, based on your experience?

For example :

for i, p in enumerate(patches[ss_idx]):
            bar.update(i+1)
            h_features.append(calc_haralick(p)) 
def calc_haralick(roi):

    feature_vec = []

    texture_features = mt.features.haralick(roi)
    mean_ht = texture_features.mean(axis=0)

    [feature_vec.append(i) for i in mean_ht[0:9]]

    return np.array(feature_vec)

It gets i patches of images then extract features via haralick

And this is how I get patches

 h_neigh = 11 # haralick neighbourhood
 size = h_neigh
 shape = (img.shape[0] - size + 1, img.shape[1] - size + 1, size, size)
 strides = 2 * img.strides
 patches = stride_tricks.as_strided(img, shape=shape, strides=strides)
 patches = patches.reshape(-1, size, size)

Sorry if any information is superfluous

  • ***"... the most effective way ...?"*** Well, that heck depends on what & how the loop-body actually does. – user3666197 May 27 '20 at 12:58
  • 1
    Hm, yes, I edited the post then – Angelo Santarossa May 27 '20 at 13:03
  • Is this all one one multi-core machine? That is, do you want to parallelize your process according to the number of cores on that machine? Or do you want to allow for distributed processing? – 9769953 May 27 '20 at 13:08
  • What is patches? Is it an array or list of simple elements, such as floats or integers? Or is its contents more complex/memory-large? – 9769953 May 27 '20 at 13:10
  • I have 16 cores available according to **multiprocessing.cpu_count()** – Angelo Santarossa May 27 '20 at 13:10
  • I wasn't asking about the number of cores. I asked how you wanted to do the parallelization. If you have 16 machines with 1 core each, than that's another possibility to parallelize. Do you want to allow for e.g. distributed processing. – 9769953 May 27 '20 at 13:12
  • This is one multi-core machine with 16 cores. So I suppose working on all cores in same time isn't a great idea – Angelo Santarossa May 27 '20 at 13:14
  • You can limit the number of cores, so that's not a problem. – 9769953 May 27 '20 at 13:28
  • What is `patches` *exactly*? "patches of images" is rather unspecific. Since I assume you send one image at a time over to `calc_haralick`, what is an image? What Python type? And if something like a (multi-)dimensional array, how large? The reason for this question is that you may want to do things differently depending on the size of the data. – 9769953 May 27 '20 at 13:29
  • I edited again the post, I didn't know it was size-dependant – Angelo Santarossa May 27 '20 at 13:33
  • It is not necessarily size dependent (or not principally); my suggestions, however, are. Note that your edits still don't mention the actual (average, median, range) size of the images. They could be 20 by 20, or 20K by 20K. – 9769953 May 27 '20 at 13:44

1 Answers1

1

Your images appear to be simple two-dimensional NumPy arrays, and patches a list or array of those. I assume ss_idx is an index array (i.e., not an integer), so that patches[ss_idx] remains something that can be iterated over (as in your example).

In that case, simply use multiprocessing.Pool.map:

import multiprocessing as mp

nproc = 10
with mp.Pool(nproc) as pool:
    h_features = pool.map(calc_haralick, patches[ss_idx])

See the first basic example in the multiprocessing documentation.

If you leave out nproc or set it to None, all available cores will be used.


The potential problem with multiprocessing is, that it will create nproc identical Python processes, and copy all the relevant data to those processes. If your images are large, this will cause considerable overhead.

In such a case, it may be worth to split your Python program in separate programs, where calculating the future of a single image is one independent program. That program would need to handle reading a single image and writing the features. You'd then wrap everything in e.g. a bash script that loops over all images, taking care to use only a certain amount of cores at the same (e.g., background processes, but wait every 10 images). The next step/program requires reading the independent feature files into a multi-dimensional array, but from there, you can continue your old program.

While this is more work, it may save some copying overhead (though it introduces extra I/O overhead, in particular writing the separate feature files).
It also has the optional advantage that this is fairly easy to run distributed, should the possibility ever occur.


Try multiprocessing, keeping an eye out on memory usage and CPU usage (if nothing happens for a long time, it may be copying overhead). Then, try another method.

9769953
  • 10,344
  • 3
  • 26
  • 37