Possible optimizations for calculating squared euclidean distance

Question

I need to do a few hundred million euclidean distance calculations every day in a Python project.

Here is what I started out with:

def euclidean_dist_square(x, y):
    diff = np.array(x) - np.array(y)
    return np.dot(diff, diff)

This is quite fast and I already dropped the sqrt calculation since I need to rank items only (nearest-neighbor search). It is still the bottleneck of the script though. Therefore I have written a C extension, which calculates the distance. The calculation is always done with 128-dimensional vectors.

#include "euclidean.h"
#include <math.h>

double euclidean(double x[128], double y[128])
{
    double Sum;
    for(int i=0;i<128;i++)
    {
        Sum = Sum + pow((x[i]-y[i]),2.0);
    }
    return Sum;
}

Complete code for the extension is here: https://gist.github.com/herrbuerger/bd63b73f3c5cf1cd51de

Now this gives a nice speedup in comparison to the numpy version.

But is there any way to speed this up further (this is my first C extension ever so I assume there is)? With the number of times this function is used every day, every microsecond would actually provide a benefit.

Some of you might suggest porting this completely from Python to another language, unfortunately this is a larger project and not an option :(

Thanks.

Edit

I have posted this question on CodeReview: https://codereview.stackexchange.com/questions/52218/possible-optimizations-for-calculating-squared-euclidean-distance

I will delete this question in an hour in case someone has started to write an answer.

If you include the code in the question, this would be considered on-topic for [Code Review](http://codereview.stackexchange.com/) (as long as the code works that is). — syb0rg, Jun 01 '14 at 19:47
The gist actually contains everything that is needed. What would be the best approach now? Closing this and posting a new entry on Code Review? — herrherr, Jun 01 '14 at 19:51
I would delete this question, and re-ask it over on Code Review ***with all of the relevant code included in the question***, not a link to it. If you fail to do that, I guarantee the question will be closed over there. — syb0rg, Jun 01 '14 at 19:52
Possible duplicate of [more efficient way to calculate distance in numpy?](http://stackoverflow.com/q/17527340/166749) (Sorry for close and reopen, with a gold badge I can't even vote anymore.) — Fred Foo, Jun 01 '14 at 20:28
Also very similar to [this question](http://stackoverflow.com/q/19653951/166749) (the answer is certainly the same). — Fred Foo, Jun 01 '14 at 20:30
@MikeDunlavey GCC gets rid of that `pow` call even at `-O1`. — Fred Foo, Jun 01 '14 at 20:31

Fred Foo · Accepted Answer · 2014-08-07T11:14:22.237

The fastest way to compute Euclidean distances in NumPy that I know is the one in scikit-learn, which can be summed up as

def squared_distances(X, Y):
    """Return a distance matrix for each pair of rows i, j in X, Y."""
    # http://stackoverflow.com/a/19094808/166749
    X_row_norms = np.einsum('ij,ij->i', X, X)
    Y_row_norms = np.einsum('ij,ij->i', Y, Y)
    distances = np.dot(X, Y)
    distances *= -2
    distances += X_row_norms
    distances += Y_row_norms

    np.maximum(distances, 0, distances)  # get rid of negatives; optional
    return distances

The bottleneck in this piece of code is matrix multiplication (np.dot), so make sure your NumPy is linked against a good BLAS implementation; with a multithreaded BLAS on a multicore machine and large enough input matrices, it should be faster than anything you can whip up in C. Note that it relies on the binomial formula

||x - y||² = ||x||² + ||y||² - 2 x⋅y

and that either X_row_norms or Y_row_norms can be cached across invocations for the k-NN use case.

(I'm a coauthor of this code, and I spent quite some time optimizing both it and the SciPy implementation; scikit-learn is faster at the expense of some accuracy, but for k-NN that shouldn't matter too much. The SciPy implementation, available in scipy.spatial.distance, is actually an optimized version of the code you just wrote and is more accurate.)

+1 for the "horse's mouth". You won't get that on code review. — david.pfx, Jun 02 '14 at 03:24
Could you provide an example for this, especially what X and Y are supposed to look like? — herrherr, Jun 02 '14 at 11:58
@herrherr: `X` would be an n×128 matrix, `Y` an m×128 one. The result is then an n×m matrix of distances. — Fred Foo, Jun 02 '14 at 12:32
I'm using your suggestion from [here](http://stackoverflow.com/questions/19653951/euclidean-norm-using-numexpr) now and I have to say it is much faster than the initial numpy version I used and it's also much faster than the C version. Thanks! — herrherr, Jun 02 '14 at 13:06
This is potentially unstable: https://github.com/scikit-learn/scikit-learn/issues/9354 — Andreas Mueller, Nov 15 '18 at 16:25

Possible optimizations for calculating squared euclidean distance

1 Answers1

Linked