I need to do a few hundred million euclidean distance calculations every day in a Python project.
Here is what I started out with:
def euclidean_dist_square(x, y):
diff = np.array(x) - np.array(y)
return np.dot(diff, diff)
This is quite fast and I already dropped the sqrt calculation since I need to rank items only (nearest-neighbor search). It is still the bottleneck of the script though. Therefore I have written a C extension, which calculates the distance. The calculation is always done with 128-dimensional vectors.
#include "euclidean.h"
#include <math.h>
double euclidean(double x[128], double y[128])
{
double Sum;
for(int i=0;i<128;i++)
{
Sum = Sum + pow((x[i]-y[i]),2.0);
}
return Sum;
}
Complete code for the extension is here: https://gist.github.com/herrbuerger/bd63b73f3c5cf1cd51de
Now this gives a nice speedup in comparison to the numpy version.
But is there any way to speed this up further (this is my first C extension ever so I assume there is)? With the number of times this function is used every day, every microsecond would actually provide a benefit.
Some of you might suggest porting this completely from Python to another language, unfortunately this is a larger project and not an option :(
Thanks.
Edit
I have posted this question on CodeReview: https://codereview.stackexchange.com/questions/52218/possible-optimizations-for-calculating-squared-euclidean-distance
I will delete this question in an hour in case someone has started to write an answer.