On implementing an edge preserving filter similar to ImageJ's Kuwahara filter, which assigns each pixel to the mean of the area with the smallest deviation around it, I'm struggling with performance issues.
Counterintuitively, the calculation of means and deviations to separate matrices is fast compared to the final resorting to compile the output array. The ImageJ implementation above seems to expect about 70% of total processing time for this step though.
Given two arrays means and stds, whose sizes are 2 kernel sizes p bigger than the output array 'res' in each axis, I want to assign a pixel to the mean of the area with the smallest deviation:
#vector to middle of surrounding area (approx.)
p2 = p/2
# x and y components of vector to each quadrant
index2quadrant = np.array([[1, 1, -1, -1],[1, -1, 1, -1]]) * p2
Iterate over all pixels of output array of shape (asize, asize):
for col in np.arange(asize) + p:
for row in np.arange(asize) + p:
Searching for the minimum std dev in the 4 quadrants around the current coordinate, and using the corresponding index to assign the previously computed mean:
minidx = np.argmin(stds[index2quadrant[0] + col, index2quadrant[1] + row])
#assign mean
res[col - p, row - p] = means[index2quadrant[:,minidx][0] + col,index2quadrant[:,minidx][1] + row]
The Python profiler gives the following results on filtering a 1024x1024 array with a 8x8 pixel kernel:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 30.024 30.024 <string>:1(<module>)
1048576 2.459 0.000 4.832 0.000 fromnumeric.py:740(argmin)
1 23.720 23.720 30.024 30.024 kuwahara.py:4(kuwahara)
2 0.000 0.000 0.012 0.006 numeric.py:65(zeros_like)
2 0.000 0.000 0.000 0.000 {math.log}
1048576 2.373 0.000 2.373 0.000 {method 'argmin' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
2 0.012 0.006 0.012 0.006 {method 'fill' of 'numpy.ndarray' objects}
8256 0.189 0.000 0.189 0.000 {method 'mean' of 'numpy.ndarray' objects}
16512 0.524 0.000 0.524 0.000 {method 'reshape' of 'numpy.ndarray' objects}
8256 0.730 0.000 0.730 0.000 {method 'std' of 'numpy.ndarray' objects}
1042 0.012 0.000 0.012 0.000 {numpy.core.multiarray.arange}
1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.array}
2 0.000 0.000 0.000 0.000 {numpy.core.multiarray.empty_like}
2 0.003 0.002 0.003 0.002 {numpy.core.multiarray.zeros}
8 0.002 0.000 0.002 0.000 {zip}
For me, there is not much of an indication (-> numpy?), where the time is lost, since except for argmin, the total time seems to be negligible.
Do you have any suggestions, how to improve performance?