0

I have a pandas dataframe after reading in a .csv file that resembles:

import itertools as it
import pandas as pd
import numpy as np
import scipy as sp

x = np.random.randn(5)
y = np.sin(x)
z = np.sin(x)+1
df = pd.DataFrame({'x':x, 'y':y, 'z':z})

df = 
          x         y         z
0  0.233070  0.230965  1.230965
1 -1.956269 -0.926621  0.073379
2 -0.015575 -0.015575  0.984425
3 -0.106887 -0.106684  0.893316
4 -0.510168 -0.488324  0.511676

I would like to compute pairwise euclidean distances using itertools.combinations and scipy.spatial.distance.euclidean and store these values either by extending the df or as a new dataframe. For example, extending the df would resemble this (x.xxxxxxx are of course the values that need to be calculated):

df = 
          x         y         z        x-y        x-z        x-z 
0  0.233070  0.230965  1.230965   x.xxxxxx   x.xxxxxx   x.xxxxxx
1 -1.956269 -0.926621  0.073379   x.xxxxxx   x.xxxxxx   x.xxxxxx
2 -0.015575 -0.015575  0.984425   x.xxxxxx   x.xxxxxx   x.xxxxxx
3 -0.106887 -0.106684  0.893316   x.xxxxxx   x.xxxxxx   x.xxxxxx
4 -0.510168 -0.488324  0.511676   x.xxxxxx   x.xxxxxx   x.xxxxxx

The actual dataset I'm working with is large so I'd like to figure an efficient pythonic way of dealing with this. I only need unique pairwise comparisons, so I'd like to avoid the the n-way comparisons that itertools.combinations includes (i.e., here this would be x-y-z), as well as avoid repetitions (e.g., y-x, z-x, z-y). Hope this is clear, thanks for any assistance.

lf208
  • 77
  • 5
  • 2
    So, that's just elementwise subtraction and not exactly distance finding, or is it? – Divakar Apr 12 '17 at 17:29
  • Ah, yes, the column headers are not clear, apologies. They should state something like the EuclidDist_xy, EuclidDist_xz, etc. But I've also realized that given this is a 1D space then the distance is simply the absolute difference (abs). – lf208 Apr 12 '17 at 18:09

0 Answers0