0

I want to measure the distance (Euclidean) between data sets of 5 dimensions. It looks like this:

                  center                                        x
0    [0.09771348879, 1.856078237, 2.100760575, 9.25...  [-1.35602640228e-12, -2.94706481441e-11, -6.51...
1    [8.006780488, 1.097849488, 0.6275244427, 0.572...  [4.99212418613, 5.01853294023, -0.014304672946...
2    [-1.40785823, -1.714959744, -0.5524032233, -0....  [-1.61000102139e-11, -4.680034138e-12, 1.96087...

index, then point (center), and the third is the other point (x), all the points are 5D. I want to use pdist since it's applicable to n-d. But the problem is that the points are arranged as m n-dimensional row vectors in the matrix X. While what I have above is only the data format and not the matrix and contains the index as well which it should not.

My code is:( S is the format above)

S = pd.DataFrame(paired_data, columns=['x','center'])

print (S.to_string())

Y = pdist(S[1:], 'euclidean')
print Y
Micheal
  • 17
  • 1
  • 7

1 Answers1

0

This seems to work:

for i in range(S.shape[0]):
    M = np.matrix( [S['x'][i], S['center'][i]] )
    print pdist(M, 'euclidean')

or with iterrows():

for row in S.iterrows():
    M = np.matrix( [row[1]['x'], row[1]['center']] )
    print pdist(M, 'euclidean')

Note that the creation of a matrix isn't necessary, pdist will handle a python list of lists just fine:

for row in S.iterrows():
    print pdist([row[1]['x'], row[1]['center']], 'euclidean')
jedwards
  • 29,432
  • 3
  • 65
  • 92
  • The first one seems working. Thanks. If I want to sum all the distances I got in the end, how I can do that? – Micheal Mar 06 '15 at 23:53