-1

Right now I am doing assignments from cs 231 n , and I wanted to calculate euclidean distance between points:

dists[i, j]=0
for k in range(3072):
    dists[i, j]+=math.pow((X[i,k] - self.X_train[j,k]),2)
dists[i, j] = math.sqrt(dists[i,j])

however, this code is very slow. Then I tried

 dists[i,j] = dist = np.linalg.norm(X[i,:] - self.X_train[j,:])

which is way faster. The question is why? Doesn't np.linalg.norm also loop through all coordinates of all points, subtracts, puts into power, sums and squares them? Could someone give me a detailed answer : is it because of how does np.linalg.norm access elements or there is other reason?

Hype Totec
  • 31
  • 7
  • 1
    Please can you: fix your code formatting, provide a [mcve], and potentially add some timings to demonstrate the behaviour you are seeing. – jpp Aug 15 '18 at 09:42
  • 1
    Yes, `np.linalg.norm` does also loop through the numbers. But the loops used there are very fast as they are written in C. Pure Python loops introduce some overhead because they are so simple to write and to read :) – Joe Aug 15 '18 at 12:44

1 Answers1

1

NumPy can do the entire calculation in one fell swoop in optimized, accelerated (e.g. SSE, AVX, what-have-you) C code.

The original code does all of its work in Python (aside from the math functions, which are implemented in C, but also take time roundtripping Python objects), which just, well, is slower.

AKX
  • 152,115
  • 15
  • 115
  • 172