1

I wrote a function to process the image, in which I extract many patches and then process them using the same function(func) to generate a new image. However, this is very slow because of two loop, func, the number of patches, size of the patches. I don't know how can I accelerate this code.

The function is like below.

# code1
def filter(img, func, ksize, strides=1):
    height,width = img.shape
    f_height,f_width = ksize
    new_height = height - f_height + 1
    new_width = width - f_width + 1

    new_img = np.zeros((new_height,new_width))

    for i in range(new_height):
        for j in range(new_width):
            patch = img[i:i+f_height,j:j+f_width]
            new_img[i][j] = func(patch)

    return new_img

func can be very flexible and time-consuming. I take one for example. The func below want to calculate center point of patch divide median of the patch. However, I don't want those pixels whose value are 255 to calculate median(255 is default value for invalid pixels). So I use masked array in numpy. Masked array slow the code for several times and I have no idea how to optimize this.

# code2
def relative_median_and_center_diff(patch, in_the_boundary, rectangle, center_point):
        mask = patch == 255
        mask[center_point] = True
        masked_patch = np.ma.array(patch, mask=mask)
        count = masked_patch.count()
        if count <= 1:
            return 0
        else:
            return patch[center_point]/(np.ma.median(masked_patch)+1)

ideas I have tried or got:

  1. I used some numpy function to extract patches before the loop, expecting this can be faster than patch = img[i:i+f_height,j:j+f_width]. I found functions to extract patches from Extracting patches of a certain size from the image in python efficiently At first I tried view_as_windows from skimage.util.shape. The code was changed as shown below. This takes more time than code1. I also tried sklearn.feature_extraction.image.extract_patches_2d and found this is faster than code3, but still slower than code1.(Can anyone tell me why this is the case?)
# code3
def filter(img, func, ksize, strides=1):
    height,width = img.shape
    f_height,f_width = ksize
    new_height = height - f_height + 1
    new_width = width - f_width + 1

    new_img = np.zeros((new_height,new_width))

    from skimage.util.shape import view_as_windows
    patches = view_as_windows(img, (f_height,f_width))

    for i in range(new_height):
        for j in range(new_width):
            patch = patches[i,j]
            new_img[i][j] = func(patch)

    return new_img
  1. This operation is a bit like convolution or filter except func. I wonder how those lib deal with this and Can you guys give me some clues.

  2. Can we avoid two loops in this situation? Maybe this can accelerate the code.

  3. I have gpus. Can I change the code to run it on gpus and make it process the patches on parallel to make it faster?

  4. Change the code to C. This is the last thing I want to do because maybe this is a bit of messy.

Can you guys give me some ideas or some suggestions?

李悦城
  • 59
  • 7
  • Do u need all information for all patches ? Since i don't know the scenario, and this almost looks like convolution, can't we have the concept of strides, like you skip few patches/or assign the mean value of the neighboring patches to that patch, so that you reduce the number of patches being processed ? – venkata krishnan Jul 19 '19 at 03:58
  • @venkatkrishnan Thank you for your reply! I do this for something like semantic segmentation. So I want it to be as precise as possible. In most case, quality is more important. I will investigate to what extent this method affects the quality. Thanks you again! – 李悦城 Jul 19 '19 at 04:20
  • Also to know, why each one is working slow is because they do a lot of validation checks etc to extract the patch. You can look at the source code of skimage view as windows here - https://github.com/scikit-image/scikit-image/blob/master/skimage/util/shape.py#L218 and numpy mask again is a sequential iterative process(moving window type) which obviously gonna make it slow. – venkata krishnan Jul 19 '19 at 05:03

1 Answers1

0

If your computer has more than one CPU, you could multi-thread this process by submitting it to a ThreadPoolExecutor

Your code should look something like this:

from concurrent.futures import ThreadPoolExecutor
from multiprocessing import cpu_count()

executor = ThreadPoolExecutor(max_workers=cpu_count())
future = executor.submit(func, data, *args)
future_to_item[future] = data

for future in concurrent.futures.as_completed(future_to_item):
    # do something when you get the result

I use a ThreadPoolExecutor for image processing all the time.

Since we only have the functions and don't know how your program (fully) works, check out concurrency in Python so you can get a better idea of how to integrate this into your code: https://docs.python.org/3/library/concurrent.futures.html

Code Daddy
  • 118
  • 5
  • I tried to extract patches first and then use `ProcessPoolExecutor` to use multiple processor. However, it took much more time.(6s vs 0.25s). The line I use is `future_to_index = {executor.submit(func, patch): i for i,patch in enumerate(patches)}`. Maybe this is because the patch is too small and can't fully utilize multiple processor? or I did it in the wrong way? – 李悦城 Jul 20 '19 at 10:18
  • More code:`with concurrent.futures.ProcessPoolExecutor(max_workers=cpu_count()) as executor: future_to_index = {executor.submit(func, patch): i for i,patch in enumerate(patches)} for future in concurrent.futures.as_completed(future_to_index): i = future_to_index[future] new_img[i//new_width][i%new_width] = future.result()` – 李悦城 Jul 20 '19 at 10:22
  • A ```ProcessPoolExecutor``` is much different than a ```ThreadPoolExecutor```. Check out the link I attached to my answer – Code Daddy Jul 22 '19 at 21:40