4

I need to do a few hundred million euclidean distance calculations every day in a Python project.

Here is what I started out with:

def euclidean_dist_square(x, y):
    diff = np.array(x) - np.array(y)
    return np.dot(diff, diff)

This is quite fast and I already dropped the sqrt calculation since I need to rank items only (nearest-neighbor search). It is still the bottleneck of the script though. Therefore I have written a C extension, which calculates the distance. The calculation is always done with 128-dimensional vectors.

#include "euclidean.h"
#include <math.h>

double euclidean(double x[128], double y[128])
{
    double Sum;
    for(int i=0;i<128;i++)
    {
        Sum = Sum + pow((x[i]-y[i]),2.0);
    }
    return Sum;
}

Complete code for the extension is here: https://gist.github.com/herrbuerger/bd63b73f3c5cf1cd51de

Now this gives a nice speedup in comparison to the numpy version.

But is there any way to speed this up further (this is my first C extension ever so I assume there is)? With the number of times this function is used every day, every microsecond would actually provide a benefit.

Some of you might suggest porting this completely from Python to another language, unfortunately this is a larger project and not an option :(

Thanks.

Edit

I have posted this question on CodeReview: https://codereview.stackexchange.com/questions/52218/possible-optimizations-for-calculating-squared-euclidean-distance

I will delete this question in an hour in case someone has started to write an answer.

Community
  • 1
  • 1
herrherr
  • 708
  • 1
  • 9
  • 26
  • If you include the code in the question, this would be considered on-topic for [Code Review](http://codereview.stackexchange.com/) (as long as the code works that is). – syb0rg Jun 01 '14 at 19:47
  • The gist actually contains everything that is needed. What would be the best approach now? Closing this and posting a new entry on Code Review? – herrherr Jun 01 '14 at 19:51
  • 4
    I would delete this question, and re-ask it over on Code Review ***with all of the relevant code included in the question***, not a link to it. If you fail to do that, I guarantee the question will be closed over there. – syb0rg Jun 01 '14 at 19:52
  • Possible duplicate of [more efficient way to calculate distance in numpy?](http://stackoverflow.com/q/17527340/166749) (Sorry for close and reopen, with a gold badge I can't even vote anymore.) – Fred Foo Jun 01 '14 at 20:28
  • Also very similar to [this question](http://stackoverflow.com/q/19653951/166749) (the answer is certainly the same). – Fred Foo Jun 01 '14 at 20:30
  • 1
    @MikeDunlavey GCC gets rid of that `pow` call even at `-O1`. – Fred Foo Jun 01 '14 at 20:31

1 Answers1

12

The fastest way to compute Euclidean distances in NumPy that I know is the one in scikit-learn, which can be summed up as

def squared_distances(X, Y):
    """Return a distance matrix for each pair of rows i, j in X, Y."""
    # http://stackoverflow.com/a/19094808/166749
    X_row_norms = np.einsum('ij,ij->i', X, X)
    Y_row_norms = np.einsum('ij,ij->i', Y, Y)
    distances = np.dot(X, Y)
    distances *= -2
    distances += X_row_norms
    distances += Y_row_norms

    np.maximum(distances, 0, distances)  # get rid of negatives; optional
    return distances

The bottleneck in this piece of code is matrix multiplication (np.dot), so make sure your NumPy is linked against a good BLAS implementation; with a multithreaded BLAS on a multicore machine and large enough input matrices, it should be faster than anything you can whip up in C. Note that it relies on the binomial formula

||x - y||² = ||x||² + ||y||² - 2 x⋅y    

and that either X_row_norms or Y_row_norms can be cached across invocations for the k-NN use case.

(I'm a coauthor of this code, and I spent quite some time optimizing both it and the SciPy implementation; scikit-learn is faster at the expense of some accuracy, but for k-NN that shouldn't matter too much. The SciPy implementation, available in scipy.spatial.distance, is actually an optimized version of the code you just wrote and is more accurate.)

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • +1 for the "horse's mouth". You won't get that on code review. – david.pfx Jun 02 '14 at 03:24
  • Could you provide an example for this, especially what X and Y are supposed to look like? – herrherr Jun 02 '14 at 11:58
  • @herrherr: `X` would be an n×128 matrix, `Y` an m×128 one. The result is then an n×m matrix of distances. – Fred Foo Jun 02 '14 at 12:32
  • I'm using your suggestion from [here](http://stackoverflow.com/questions/19653951/euclidean-norm-using-numexpr) now and I have to say it is much faster than the initial numpy version I used and it's also much faster than the C version. Thanks! – herrherr Jun 02 '14 at 13:06
  • 1
    This is potentially unstable: https://github.com/scikit-learn/scikit-learn/issues/9354 – Andreas Mueller Nov 15 '18 at 16:25