I am writing a computer vision library from scratch in Python to work with a rpi
camera. At the moment, I have implemented conversion to greyscale
and some other basic img
operations which both run relatively fast on my model B
rpi3
.
However, my edge detection function using the sobel
operator (wikipedia description) is much slower than the other functions although it does work. Here it is:
def sobel(img):
xKernel = np.array([[-1,0,1],[-2,0,2],[-1,0,1]])
yKernel = np.array([[-1,-2,-1],[0,0,0],[1,2,1]])
sobelled = np.zeros((img.shape[0]-2, img.shape[1]-2, 3), dtype="uint8")
for y in range(1, img.shape[0]-1):
for x in range(1, img.shape[1]-1):
gx = np.sum(np.multiply(img[y-1:y+2, x-1:x+2], xKernel))
gy = np.sum(np.multiply(img[y-1:y+2, x-1:x+2], yKernel))
g = abs(gx) + abs(gy) #math.sqrt(gx ** 2 + gy ** 2) (Slower)
g = g if g > 0 and g < 255 else (0 if g < 0 else 255)
sobelled[y-1][x-2] = g
return sobelled
and running it with this greyscale
image of a cat:
I get this response, which seems correct:
The application of the library, and this function in particular, is on a chess playing robot in which the edge detection will help to recognise the location of the pieces. The problem is that it takes >15
seconds to run which is a significant problem as it will add to the time the robot takes to make its move by a lot.
My question is: how can I speed it up?
So far, I have tried a couple of things:
Instead of
squaring
thenadding
, thensquare rooting
thegx
andgy
values to get the total gradient, I justsum
theabsolute
values. This improved the speed a decent amount.Using a lower
resolution
image from therpi
camera. This obviously is a simple way to make these operations run faster, however its not really that viable as its still pretty slow at the minimum usable resolution of480x360
which is massively reduced from the camera's max of3280x2464
.Writing nested for-loops to do the
matrix convolutions
in place of thenp.sum(np.multiply(...))
. This ended up being slightly slower which I was surprised by as sincenp.multiply
returns a new array, I thought that it should have been faster to do it withloops
. I think though that this may be due to the fact thatnumpy
is mostly written inC
or that the new array isn't actually stored so doesn't take a long time but I'm not too sure.
Any help would be much appreciated - I think the main thing for improvement is point 3
, i.e the matrix
multiplication and summing.