How to incorporate elevation into euclidean distance matrix in pandas?

Question

I have the following dataframe in pandas:

import pandas as pd

df = pd.DataFrame({
    "CityId": {
        "0": 0, 
        "1": 1, 
        "2": 2, 
        "3": 3, 
        "4": 4
    }, 
    "X": {
        "0": 316.83673906150904, 
        "1": 4377.40597216624, 
        "2": 3454.15819771172, 
        "3": 4688.099297634771, 
        "4": 1010.6969517482901
    }, 
    "elevation_meters": {
        "0": 1, 
        "1": 2, 
        "2": 3, 
        "3": 4, 
        "4": 5
    }, 
    "Y": {
        "0": 2202.34070733524, 
        "1": 336.602082171235, 
        "2": 2820.0530112481106, 
        "3": 2935.89805580997, 
        "4": 3236.75098902635
    }
})

I am trying to create a distance matrix that represents the cost of moving between each of these CityIds. Using pdist and squareform from scipy.spatial.distance I can do the following:

from scipy.spatial.distance import pdist, squareform

df_m = pd.DataFrame(
    squareform(
        pdist(
            df[['CityId', 'X', 'Y']].iloc[:, 1:],
            metric='euclidean')
    ),
    index=df.CityId.unique(),
    columns= df.CityId.unique()
)

This gives me a distance matrix between all the CityIds using pairwise distances calculated from pdist.

I would like to incorporate elevation_meters into the this distance matrix. What is an efficient way to do so?

What is the formula for the distance that involves `elevation_meters`? Is it jut `z` coordinate? — Quang Hoang, May 14 '19 at 12:18
Its just a value in meters, which can be added to the pairwise distance. If it helps it can be made into a `z` coordinate. — ZeroStack, May 14 '19 at 12:34

score 2 · Accepted Answer · answered May 14 '19 at 12:45

2

You can try scipy.spatial.distance_matrix:

xx = df[['X','elevation_meters', 'Y']]
pd.DataFrame(distance_matrix(xx,xx), columns= df['CityId'],
             index=df['CityId'])

Output:

CityId  0               1                2              3               4
CityId                  
0       0.000000        4468.691544     3197.555070     4432.386687     1245.577226
1       4468.691544     0.000000        2649.512402     2617.799439     4443.602402
2       3197.555070     2649.512402     0.000000        1239.367465     2478.738402
3       4432.386687     2617.799439     1239.367465     0.000000        3689.688537
4       1245.577226     4443.602402     2478.738402     3689.688537     0.000000

answered May 14 '19 at 12:45

Quang Hoang

146,074
10
56
74

Thanks, this seems to work. I'm still trying to understand `scipy.spatial.distance_matrix`, how does it differentiate between latitude/longitude and elevation? Generally, aren't `z` coordinates just a height in meters/kilometers? Why is `elevation_meters` positioned in the middle? – ZeroStack May 15 '19 at 00:59
in a nutshell, it just looks at every pair of rows and compute the distance `sqrt((x1-x2)**2 + (z1-z2)**2 + (y1-y2)**2)`. About why `elevation_meters` comes in the middle, I have no idea. Maybe you should ask the creator of your data. – Quang Hoang May 15 '19 at 01:04
In terms of the `elevation_meters`, I was referring to the positional placement in `scipy.spatial.distance_matrix` function, and whether it matters, especially after considering that latitude and longitude are represented as geographic coordinates and elevation_meters is in meters. – ZeroStack May 15 '19 at 01:53
The order doesn't matter as you can see in the formula. You can pass either `[X, elev, Y]` or `[X,Y,elev]` and still get the same answer. – Quang Hoang May 15 '19 at 01:56
In that case it seems that `squareform(pdist(df[['X','elevation_meters', 'Y']])) == distance_matrix(xx,xx)` – ZeroStack May 15 '19 at 02:01
That's what I was trying to say in my comments below your questions. – Quang Hoang May 15 '19 at 02:02
Thanks Quang, my ignorance, I did not realise it accepts an n-dimensional space. – ZeroStack May 15 '19 at 02:04

How to incorporate elevation into euclidean distance matrix in pandas?

1 Answers1