0

I have a DataFrame of (x, y) coordinates that I would like to transform into array's to perform pairwise distance calculations on.

df = pd.DataFrame({'type':      ['a', 'a', 'a', 'b', 'b', 'c', 'c', 'c'],
...                      'x': [1, 3, 5, 1, 3, 1, 3, 5],
...                      'y':   [2, 4, 6, 2, 4, 2, 4, 6]})

Desired output - A new DataFrame of grouped/aggregated coordinates in an array so that I can apply a fuction to each array:

grp =       coordinates
    a    array([[1, 2],
               [3, 4],
               [5, 6]])

    b    array([[1, 2],
               [3, 4]])

    c    array([[1, 2],
               [3, 4],
               [5, 6]])

Distance calculation I wish to apply...

grp['distances'] = grp.apply(lambda x: scipy.spatial.distance.pdist(x['coordinates'], 'euclidean'), axis = 1)

I can't seem to get the groupby function to do this. Any ideas?

chris_l
  • 3
  • 2

1 Answers1

1

create a new column with pairs x,y

df['xy'] = df.apply(lambda x: [x['x'], x['y']], axis=1)

groupby and aggregate into a list of lists

gb = df.groupby('type')
df2 = gb.aggregate({'xy': lambda x: list(x)})

this produces:

df2  
    xy
type    
a   [[1, 2], [3, 4], [5, 6]]
b   [[1, 2], [3, 4]]
c   [[1, 2], [3, 4], [5, 6]]

note that to apply your distance function you have to do:

from scipy.spatial import distance
df2['distances'] = df2['xy'].apply(lambda x: distance.pdist(x, 'euclidean'))

df2

    xy                          distances
type        
a   [[1, 2], [3, 4], [5, 6]]    [2.82842712475, 5.65685424949, 2.82842712475]
b   [[1, 2], [3, 4]]            [2.82842712475]
c   [[1, 2], [3, 4], [5, 6]]    [2.82842712475, 5.65685424949, 2.82842712475]
joaquin
  • 82,968
  • 29
  • 138
  • 152