Kuwahara filter with performance issues

Question

On implementing an edge preserving filter similar to ImageJ's Kuwahara filter, which assigns each pixel to the mean of the area with the smallest deviation around it, I'm struggling with performance issues.

Counterintuitively, the calculation of means and deviations to separate matrices is fast compared to the final resorting to compile the output array. The ImageJ implementation above seems to expect about 70% of total processing time for this step though.

Given two arrays means and stds, whose sizes are 2 kernel sizes p bigger than the output array 'res' in each axis, I want to assign a pixel to the mean of the area with the smallest deviation:

#vector to middle of surrounding area (approx.)
p2 = p/2

# x and y components of vector to each quadrant
index2quadrant = np.array([[1, 1, -1, -1],[1, -1, 1, -1]]) * p2

Iterate over all pixels of output array of shape (asize, asize):

for col in np.arange(asize) + p:
    for row in np.arange(asize) + p:

Searching for the minimum std dev in the 4 quadrants around the current coordinate, and using the corresponding index to assign the previously computed mean:

        minidx = np.argmin(stds[index2quadrant[0] + col, index2quadrant[1] + row])

        #assign mean
        res[col - p, row - p] =  means[index2quadrant[:,minidx][0] + col,index2quadrant[:,minidx][1] + row]

The Python profiler gives the following results on filtering a 1024x1024 array with a 8x8 pixel kernel:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.000    0.000   30.024   30.024 <string>:1(<module>)
1048576    2.459    0.000    4.832    0.000 fromnumeric.py:740(argmin)
    1   23.720   23.720   30.024   30.024 kuwahara.py:4(kuwahara)
    2    0.000    0.000    0.012    0.006 numeric.py:65(zeros_like)
    2    0.000    0.000    0.000    0.000 {math.log}
1048576    2.373    0.000    2.373    0.000 {method 'argmin' of 'numpy.ndarray' objects}
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
    2    0.012    0.006    0.012    0.006 {method 'fill' of 'numpy.ndarray' objects}
 8256    0.189    0.000    0.189    0.000 {method 'mean' of 'numpy.ndarray' objects}
16512    0.524    0.000    0.524    0.000 {method 'reshape' of 'numpy.ndarray' objects}
 8256    0.730    0.000    0.730    0.000 {method 'std' of 'numpy.ndarray' objects}
 1042    0.012    0.000    0.012    0.000 {numpy.core.multiarray.arange}
    1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}
    2    0.000    0.000    0.000    0.000 {numpy.core.multiarray.empty_like}
    2    0.003    0.002    0.003    0.002 {numpy.core.multiarray.zeros}
    8    0.002    0.000    0.002    0.000 {zip}

For me, there is not much of an indication (-> numpy?), where the time is lost, since except for argmin, the total time seems to be negligible.

Do you have any suggestions, how to improve performance?

I wonder if you could adapt some of the optimizations that Jan Eric Kyprianidis has used in his GPU-based Kuwahara implementations: http://www.kyprianidis.com/ . I ported one of his fragment shaders to iOS: http://stackoverflow.com/questions/5830139/where-can-i-find-sample-opengl-es-2-0-shaders-that-perform-image-processing-task/9402041#9402041 and found that it could filter a 1024x768 image on an iPad 2 in under 2 s. GPUs are odd animals when compared to the CPU, and these techniques might not map well to Python, but there might be something that could help somewhere in there. — Brad Larson, Jul 14 '12 at 19:20

Kuwahara filter with performance issues

0 Answers0