Calculating Geographic distances between each row of X,Y points in a python dataframe

Question

Python Beginner here. I have a python dataframe consisting of X,Y points that looks similar to this:

What I want to do is look at row 1 and find the distance between row 1 and row 2 and output the new distance between those 2 X,Y locations to a new column called "dist". Then do the same for row 2 and 3 and so on. My X,Y data is much larger than this, but this is the basis for my problem. Ultimately the data stops, each point is making up a larger polyline so the end point will have a zero distance.

I'm aware I can use geopy, numpy, and pyproj as few. I initially tried haversine distance but was having issues importing the python module. I'm not sure how to approach this problem using those modules, do I need a search cursor and apply that to each row? So, If I have a polyline with nodes, calculating the distances between each of those nodes. These are coordinates in real locations on earth, so not a cartesian coordinate system, if you will

I missunderstood your question thats why I deleted my answer. I will come up with a proper solution. — Dimitar, Mar 23 '23 at 15:18
Could you specify if you want to know ether Dx1 = ABS(X2-X1) and Dy1 = ABS(Y2-Y1) or a distance between the points P1(X1,Y1) and P2(X2,Y2) ? — Roberto, Mar 23 '23 at 15:19
@Roberto the geographic distance between the points P1(X1,Y1) and P2(X2,Y2). So, If I have a polyline with nodes, calculating the distances between each of those nodes. These are coordinates in real locations on earth, so not a cartesian coordinate system, if you will. — Connor Garrett, Mar 23 '23 at 15:33

Roberto · Accepted Answer · 2023-03-23T16:48:11.437

2

In order to calculate distances between following points you can use an approach below. For the testing purposes I defined corners of a rectangle.

X = [0, 1, 1, 0, 0]
Y = [0, 0, 1, 1, 0]

df = pd.DataFrame({"X": X, "Y": Y})

df["X_lag"] = df["X"].shift(1)
df["Y_lag"] = df["Y"].shift(1)


distances = np.sqrt((df['X']-df["X_lag"])**2+(df['Y']-df["Y_lag"])**2)
print(distances)

this gives a pandas Series with the following values: [nan, 1.0, 1.0, 1.0, 1.0]

So now you can drop lag columns with df.drop(["X_lag", "Y_lag"], axis=1, inplace=True) and you get:

X  Y    distance
0  0       NaN
1  0       1.0
1  1       1.0
0  1       1.0
0  0       1.0

For a geographic distance you can import geopy.distance and apply the following code. It will interpret previous numbers as degrees.

def calc_orthodromic(row):
    try:
        return geopy.distance.geodesic(row["XY"], row["XY_lag"]).m
    except:
        return np.NaN

df['XY'] = list(zip(df["X"], df["Y"]))
df['XY_lag'] = list(zip(df["X_lag"], df["Y_lag"]))

df['distance'] = df.apply(calc_orthodromic, axis=1)

Which give distance in meters: [nan, 110574.3885578, 111302.64933943, 110574.3885578, 111319.49079327, 156899.56829134]

edited Mar 23 '23 at 16:48

answered Mar 23 '23 at 15:29

Roberto

649
1
8
22

is this following a cartesian coordinate system or by geographic distances? I'm looking for distances in meters. – Connor Garrett Mar 23 '23 at 16:07
1

@ConnorGarrett this one is cartesian CS. So, do you want to calculate the orthodromic distance (on a sphere)? – Roberto Mar 23 '23 at 16:13
Yup, I'm looking for the orthodromic distance. – Connor Garrett Mar 23 '23 at 16:15
Thank you, where are you defining "XY" and "XY_lag" as your rows? – Connor Garrett Mar 23 '23 at 16:36
1

@ConnorGarrett I forgot to paste these lines. See now. – Roberto Mar 23 '23 at 16:48
This is actually a perfect solution, however the row distance calculations are in reverse than what I was hoping for. The first row should have a distance value (from the 1st XY to the XY in the 2nd row). The last row should be the value that doesn't have a distance value. – Connor Garrett Mar 23 '23 at 17:49
Nevermind! I fixed the shift to a -1 and this fixed my problem! – Connor Garrett Mar 23 '23 at 18:52

Calculating Geographic distances between each row of X,Y points in a python dataframe

1 Answers1