0

I explain what I have to develop.

Let's say I have to perform a function that is responsible for receiving two matrices, which have the same number of columns but can differ in the number of rows.

In summary, we will have two matrices of vectors with the same dimension but different number N of elements.

I have to calculate the Euclidean distance between each of the vectors that make up my two matrices, and then store it in another matrix that will contain the Euclidean distance between all my vectors.

This is the code I have developed:

def compute_distances(x, y):
    # Dimension:
    N, d = x.shape
    M, d_ = y.shape

    # The dimension should be the same
    if d != d_:
        print "Dimensiones de x e y no coinciden, no puedo calcular las distancias..."
        return None

    # Calculate distance with loops:
    D = np.zeros((N, M))
    i = 0
    j = 0
    for v1 in x:
       for v2 in y:
            if(j != M):
                D[i,j] = math.sqrt(sum([(xi-yi)**2 for xi,yi in zip(v1,v2)]))
            #print "[",i,",",j,"]"
                j = j + 1
            else:
                j = 0
       i = i + 1;

    print D

In this method I am receiving the two matrices to later create a matrix that will have the Euclidean distances between the vectors of my matrices x and y.

The problem is the following, I do not know how, to each one of the calculated Euclidean distance values ​​I have to assign the correct position of the new matrix D that I have generated.

My main function has the following structure:

n = 1000
m = 700
d = 10

x = np.random.randn(n, d)
y = np.random.randn(m, d)

print "x shape =", x.shape
print "y shape =", y.shape

D_bucle = da.compute_distances(x, y)
D_cdist = cdist(x, y)

print np.max(np.abs(D_cdist - D_bucle))

B_cdist calculates the Euclidean distance using efficient methods. It has to have the same result as D_bucle that calculates the same as the other but with non efficient code, but I'm not getting what the result should be.

I think it's when I create my Euclidean matrix D that is not doing it correctly, then the calculations are incorrect.

Updated!!! I just updated my solution, my problem is that firstly I didnt know how to asign to the D Matrix my correct euclidean vector result for each pair of vectors, Now I khow how to asign it but now my problem is that only the first line from D Matrix is having a correct result in comparison with cdist function

fiticida
  • 664
  • 1
  • 10
  • 24
  • Slight off-topic, but I suggest you use [`math.hypot()`](https://docs.python.org/3/library/math.html#math.hypot) to compute the distance values. – martineau Nov 28 '17 at 00:16
  • You are calculating `n` x `m` distances? – wwii Nov 28 '17 at 00:19
  • [Numpy Broadcast to perform euclidean distance vectorized](https://stackoverflow.com/q/27948363/2823755) – wwii Nov 28 '17 at 01:01

1 Answers1

0

not fully understanding what you're asking, but I do see one problem which may explain your results:

for v1 in x:
  for v2 in y:
    D = math.sqrt(sum([(xi-yi)**2 for xi,yi in zip(v1,v2)]))

You are overwriting the value of D each of the NxM times you go through this loop. When you're done D only contains the distance of the last compare. You might need something like D[i,j] = math.sqrt(...

Brad Dre
  • 3,580
  • 2
  • 19
  • 22
  • `might need something like D[i,j] ` instead of **might** and **something like**, can you supply a solution/fix for that deficiency? – wwii Nov 28 '17 at 00:57
  • yes, guilty as charged. I was vague because I'm not familiar with numpy which provides the matrix in this case – Brad Dre Nov 28 '17 at 01:02
  • This is my solution, but I think it's not doing it properly, because comparing cdist function and my euclidean function just only the first line of both results are correctly. i = 0 j = 0 for v1 in x: for v2 in y: if(j != M): D[i,j] = math.sqrt(sum([(xi-yi)**2 for xi,yi in zip(v1,v2)])) #print "[",i,",",j,"]" j = j + 1 else: j = 0 i = i + 1; – fiticida Nov 28 '17 at 01:08
  • I tried this myself with numpy and your updated solution seems to produce a valid matrix. can you confirm? – Brad Dre Jan 12 '18 at 21:50