How can I accelerate processing tons of patches in a image?

Question

I wrote a function to process the image, in which I extract many patches and then process them using the same function(func) to generate a new image. However, this is very slow because of two loop, func, the number of patches, size of the patches. I don't know how can I accelerate this code.

The function is like below.

# code1
def filter(img, func, ksize, strides=1):
    height,width = img.shape
    f_height,f_width = ksize
    new_height = height - f_height + 1
    new_width = width - f_width + 1

    new_img = np.zeros((new_height,new_width))

    for i in range(new_height):
        for j in range(new_width):
            patch = img[i:i+f_height,j:j+f_width]
            new_img[i][j] = func(patch)

    return new_img

func can be very flexible and time-consuming. I take one for example. The func below want to calculate center point of patch divide median of the patch. However, I don't want those pixels whose value are 255 to calculate median(255 is default value for invalid pixels). So I use masked array in numpy. Masked array slow the code for several times and I have no idea how to optimize this.

# code2
def relative_median_and_center_diff(patch, in_the_boundary, rectangle, center_point):
        mask = patch == 255
        mask[center_point] = True
        masked_patch = np.ma.array(patch, mask=mask)
        count = masked_patch.count()
        if count <= 1:
            return 0
        else:
            return patch[center_point]/(np.ma.median(masked_patch)+1)

ideas I have tried or got:

I used some numpy function to extract patches before the loop, expecting this can be faster than patch = img[i:i+f_height,j:j+f_width]. I found functions to extract patches from Extracting patches of a certain size from the image in python efficiently At first I tried view_as_windows from skimage.util.shape. The code was changed as shown below. This takes more time than code1. I also tried sklearn.feature_extraction.image.extract_patches_2d and found this is faster than code3, but still slower than code1.(Can anyone tell me why this is the case?)

# code3
def filter(img, func, ksize, strides=1):
    height,width = img.shape
    f_height,f_width = ksize
    new_height = height - f_height + 1
    new_width = width - f_width + 1

    new_img = np.zeros((new_height,new_width))

    from skimage.util.shape import view_as_windows
    patches = view_as_windows(img, (f_height,f_width))

    for i in range(new_height):
        for j in range(new_width):
            patch = patches[i,j]
            new_img[i][j] = func(patch)

    return new_img

This operation is a bit like convolution or filter except func. I wonder how those lib deal with this and Can you guys give me some clues.
Can we avoid two loops in this situation? Maybe this can accelerate the code.
I have gpus. Can I change the code to run it on gpus and make it process the patches on parallel to make it faster?
Change the code to C. This is the last thing I want to do because maybe this is a bit of messy.

Can you guys give me some ideas or some suggestions?

Do u need all information for all patches ? Since i don't know the scenario, and this almost looks like convolution, can't we have the concept of strides, like you skip few patches/or assign the mean value of the neighboring patches to that patch, so that you reduce the number of patches being processed ? — venkata krishnan, Jul 19 '19 at 03:58
@venkatkrishnan Thank you for your reply! I do this for something like semantic segmentation. So I want it to be as precise as possible. In most case, quality is more important. I will investigate to what extent this method affects the quality. Thanks you again! — 李悦城, Jul 19 '19 at 04:20
Also to know, why each one is working slow is because they do a lot of validation checks etc to extract the patch. You can look at the source code of skimage view as windows here - https://github.com/scikit-image/scikit-image/blob/master/skimage/util/shape.py#L218 and numpy mask again is a sequential iterative process(moving window type) which obviously gonna make it slow. — venkata krishnan, Jul 19 '19 at 05:03

score 0 · Answer 1 · answered Jul 19 '19 at 05:33

0

If your computer has more than one CPU, you could multi-thread this process by submitting it to a ThreadPoolExecutor

Your code should look something like this:

from concurrent.futures import ThreadPoolExecutor
from multiprocessing import cpu_count()

executor = ThreadPoolExecutor(max_workers=cpu_count())
future = executor.submit(func, data, *args)
future_to_item[future] = data

for future in concurrent.futures.as_completed(future_to_item):
    # do something when you get the result

I use a ThreadPoolExecutor for image processing all the time.

Since we only have the functions and don't know how your program (fully) works, check out concurrency in Python so you can get a better idea of how to integrate this into your code: https://docs.python.org/3/library/concurrent.futures.html

answered Jul 19 '19 at 05:33

Code Daddy

118
5

I tried to extract patches first and then use `ProcessPoolExecutor` to use multiple processor. However, it took much more time.(6s vs 0.25s). The line I use is `future_to_index = {executor.submit(func, patch): i for i,patch in enumerate(patches)}`. Maybe this is because the patch is too small and can't fully utilize multiple processor? or I did it in the wrong way? – 李悦城 Jul 20 '19 at 10:18
More code:`with concurrent.futures.ProcessPoolExecutor(max_workers=cpu_count()) as executor: future_to_index = {executor.submit(func, patch): i for i,patch in enumerate(patches)} for future in concurrent.futures.as_completed(future_to_index): i = future_to_index[future] new_img[i//new_width][i%new_width] = future.result()` – 李悦城 Jul 20 '19 at 10:22
A ```ProcessPoolExecutor``` is much different than a ```ThreadPoolExecutor```. Check out the link I attached to my answer – Code Daddy Jul 22 '19 at 21:40

How can I accelerate processing tons of patches in a image?

1 Answers1