How to improve the efficiency of a sobel edge detector

Question

I am writing a computer vision library from scratch in Python to work with a rpi camera. At the moment, I have implemented conversion to greyscale and some other basic img operations which both run relatively fast on my model B rpi3.

However, my edge detection function using the sobel operator (wikipedia description) is much slower than the other functions although it does work. Here it is:

def sobel(img):
    xKernel = np.array([[-1,0,1],[-2,0,2],[-1,0,1]])
    yKernel = np.array([[-1,-2,-1],[0,0,0],[1,2,1]])
    sobelled = np.zeros((img.shape[0]-2, img.shape[1]-2, 3), dtype="uint8")
    for y in range(1, img.shape[0]-1):
        for x in range(1, img.shape[1]-1):
            gx = np.sum(np.multiply(img[y-1:y+2, x-1:x+2], xKernel))
            gy = np.sum(np.multiply(img[y-1:y+2, x-1:x+2], yKernel))
            g = abs(gx) + abs(gy) #math.sqrt(gx ** 2 + gy ** 2) (Slower)
            g = g if g > 0 and g < 255 else (0 if g < 0 else 255)
            sobelled[y-1][x-2] = g
    return sobelled

and running it with this greyscale image of a cat:

I get this response, which seems correct:

The application of the library, and this function in particular, is on a chess playing robot in which the edge detection will help to recognise the location of the pieces. The problem is that it takes >15 seconds to run which is a significant problem as it will add to the time the robot takes to make its move by a lot.

My question is: how can I speed it up?

So far, I have tried a couple of things:

Instead of squaring then adding, then square rooting the gx and gy values to get the total gradient, I just sum the absolute values. This improved the speed a decent amount.
Using a lower resolution image from the rpi camera. This obviously is a simple way to make these operations run faster, however its not really that viable as its still pretty slow at the minimum usable resolution of 480x360 which is massively reduced from the camera's max of 3280x2464.
Writing nested for-loops to do the matrix convolutions in place of the np.sum(np.multiply(...)). This ended up being slightly slower which I was surprised by as since np.multiply returns a new array, I thought that it should have been faster to do it with loops. I think though that this may be due to the fact that numpy is mostly written in C or that the new array isn't actually stored so doesn't take a long time but I'm not too sure.

Any help would be much appreciated - I think the main thing for improvement is point 3, i.e the matrix multiplication and summing.

Have you tried OpenCV's sobel? Also, have you tried 2D convolution? — Divakar, Oct 01 '17 at 14:27
@Divakar yes, I got the whole detection of chess pieces working with `OpenCV`, but I am trying to write it from scratch in Python. 2D convolution is quite broad, I thought I had already implemented it... — Joe Iddon, Oct 01 '17 at 14:53
I am not clear - Are you saying you can't use [2D convolution from Scipy](https://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.signal.convolve2d.html)? Or that you have tried it and turned out to be slower? — Divakar, Oct 01 '17 at 15:25

alkasm · Accepted Answer · 2017-10-01T15:59:27.507

Even though you're building out your own library, you really should absolutely use libraries for convolution, they will do the resulting operations in C or Fortran in the backend which will be much, much faster.

But to do it yourself if you'd like, use linear separable filters. Here's the idea:

Image:

1 2 3 4 5
2 3 4 5 1
3 4 5 1 2

Sobel x kernel:

-1 0 1
-2 0 2
-1 0 1

Result:

8, 3, -7

At the first position of the convolution, you'll be computing 9 values. First off, why? You're never going to add the middle column, don't bother multiplying it. But that's not the point of linear separable filters. The idea is simple. When you place the kernel at the first position, you'll multiply the third column by [1, 2, 1]. But then two steps later, you'll multiply the third column by [-1, -2, -1]. What a waste! You've already calculated that, you just need to negate it now. And that's the idea with a linear separable filter. Notice that you can break up the filter into a matrix outer product of two vectors:

[1]
[2]  *  [-1, 0, 1]
[1]

Taking the outer product here yields the same matrix. So the idea here is to split the operation into two pieces. First multiply the whole image with the row vector, then the column vector. Taking the row vector

-1 0 1

across the image, we end up with

2  2  2
2  2 -3
2 -3 -3

And then passing the column vector through to be multiplied and summed, we again get

8, 3, -7

One other nifty trick that may or may not be helpful (depends on your tradeoffs between memory and efficiency):

Note that in the single row multiplication, you ignore the middle value, and just subtract the right from the left values. This means that effectively what you are doing is subtracting these two images:

3 4 5     1 2 3
4 5 1  -  2 3 4
5 1 2     3 4 5

If you cut the first two columns off of your image, you get the left matrix, and if you cut the last two columns off, you get the right matrix. So you can just compute this first part of the convolution simply as

result_h = img[:,2:] - img[:,:-2]

And then you can loop through for the remaining column of the sobel operator. Or, you can even proceed further and do the same thing we just did. This time for the vertical case, you simply need to add the first and third row and twice the second row; or, using numpy addition:

result_v = result_h[:-2] + result_h[2:] + 2*result_h[1:-1]

And you're done! I may add some timings here in the near-future. For some back of the envelope calculations (i.e. hasty Jupyter notebook timings on a 1000x1000 image):

new method (sums of the images): 8.18 ms ± 399 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

old method (double for-loop):7.32 s ± 207 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Yes, you read that right: 1000x speedup.

Here's some code comparing the two:

import numpy as np

def sobel_x_orig(img):
    xKernel = np.array([[-1,0,1],[-2,0,2],[-1,0,1]])
    sobelled = np.zeros((img.shape[0]-2, img.shape[1]-2))
    for y in range(1, img.shape[0]-1):
        for x in range(1, img.shape[1]-1):
            sobelled[y-1, x-1] = np.sum(np.multiply(img[y-1:y+2, x-1:x+2], xKernel))
    return sobelled

def sobel_x_new(img):
    result_h = img[:,2:] - img[:,:-2]
    result_v = result_h[:-2] + result_h[2:] + 2*result_h[1:-1]
    return result_v

img = np.random.rand(1000, 1000)
sobel_new = sobel_x_new(img)
sobel_orig = sobel_x_orig(img)

assert (np.abs(sobel_new-sobel_orig) < 1e-12).all()

Of course, 1e-12 is some serious tolerance, but this is per element so it should be OK. But also I have a float image, you'll of course have larger differences for uint8 images.

Note that you can do this for any linear separable filter! That includes Gaussian filters. Note also that in general, this requires a lot of operations. In C or Fortran or whatever, it's usually just implemented as two convolutions of the single row/column vectors because in the end, it needs to actually loop through every element of each matrix anyways; whether you're just adding them or multiplying them, so it's no faster in C to do it this way where you add the image values than if you just do the convolutions. But looping through numpy arrays is super slow, so this method is way faster in Python.

Thanks a lot for your help! I had read about this on that wikipedia page but didn't really understand it. Your explanation and the example was such a help - I will implement it now :) — Joe Iddon, Oct 01 '17 at 15:48
One more questions, if I understand correctly, `result_v` is the resulting `img` after the sobel `x` kernel had been passed over, so I need to do the same process of operations again for the `y` kernel and then `sum` the `absolute` values to get my final edge detected image? — Joe Iddon, Oct 01 '17 at 15:58
@JoeIddon yes. Well, maybe? You should have your `sobel` operator simply perform the Sobel operation. I would leave scaling/adding the derivatives up to the user (or giving an optional boolean parameter to sum them, or even create a wrapper function). The Sobel operator should just return the Sobel in the `x` and `y` direction IMO with no scaling or adding. They are useful to have around. For e.g. a user might want the gradient direction (in which case they'd need both). And I don't like absolute summing, you lose whether the edge goes from white to black or black to white! — alkasm, Oct 01 '17 at 16:01
In my original function I actually had a parameter to select between returning the combined `x` and `y` sobels or them separately for use in a canny edge-detector, but I agree with you that having two separate functions for `x` and `y` sobels makes much more sense! Thanks for your efforts. — Joe Iddon, Oct 01 '17 at 16:04

score 1 · Answer 2 · answered Jun 12 '20 at 00:24

For what its worth, an addition:

Sobel x kernel:
-1 0 1
-2 0 2
-1 0 1

You don't need a separated kernel. 1/3 of the operations always result in zero. Just don't compute them. The rest can be simplified:

sum = -inputvalue[y-1][x-1] - 2 * inputvalue[y][x-1] - inputvalue[y+1][x-1]
+ inputvalue[y-1][x+1] + 2 * inputvalue[y][x+1] + inputvalue[y+1][x+1]

That 2 multiplications 3 substractions and 3 additions and NO looping for that spot, compared to 9 multiplications and 9 additions, where the naive approach would be to loop through the kernel. That should reduce the compute time noticeably.

Im surprised by the mentioned 1000x speedup that was achieved in the numpy example above. But this approach here helped me to improve the speed significantly. :)

score 1 · Answer 3 · answered Mar 22 '21 at 14:15

I came across the same problem and was able to speed up my code about 600 times using @jit from the Numba library (See link: https://numba.pydata.org/numba-doc/latest/user/5minguide.html). Adding @jit(nopython=True) above my function was enough to do the job.

How to improve the efficiency of a sobel edge detector

3 Answers3

Linked