4

I implemented the k-nearest-neighbours algorithm in python to classify some randomly picked images from the mnist database. However I found my distance function to be quite slow: An analisys of 10 test images against the training set of 10k images takes about 2mins. The images have a resolution of 28x28 pixels. Since I'm new to python I got the feeling this could be faster. The function is supposed to calculate the euclidean distance between two same-sized grayscale images.

def calculateDistance(image1, image2):
    distance = 0
    for i in range(len(image1)):
        for j in range(len(image1)):
            distance += math.pow((image1[i][j]-image2[i][j]),2)
    distance = numpy.sqrt(distance)
    return distance
wodzu
  • 3,004
  • 3
  • 25
  • 41

2 Answers2

9

If you're using numpy arrays to represent the images, you could use the following instead:

def calculateDistance(i1, i2):
    return numpy.sum((i1-i2)**2)

This should be much faster because it uses a fast C implementation for the heavy lifting. Also consider using caching to not compute the difference of two images twice.

Niklas B.
  • 92,950
  • 18
  • 194
  • 224
  • yes! this gives me a huge speed-up! takes now ~8secs to classify 10 test images. Thanks! – wodzu Oct 17 '15 at 10:51
0

1) compute the difference between the two images into a temporary variable then multiply that variable by itself (operation on integers) instead of doing Math.pow which is a floating point operation 2) if you're just comparing distances eg to find the pair with the smallest distance, don't bother sqrt'ing at the end (this won't actually speed things up all that much because it's not in the loop but still not needed of you're only using the result for relative comparisons)

qoba
  • 251
  • 1
  • 6
  • I tried this, but for some reason uknown to me it caused the classifcations to be less successful. It also did not speed up the calculation time - It took even more time (about 10mins). – wodzu Oct 17 '15 at 10:49