1

Python Beginner here. I have a python dataframe consisting of X,Y points that looks similar to this:

XY TABLE

What I want to do is look at row 1 and find the distance between row 1 and row 2 and output the new distance between those 2 X,Y locations to a new column called "dist". Then do the same for row 2 and 3 and so on. My X,Y data is much larger than this, but this is the basis for my problem. Ultimately the data stops, each point is making up a larger polyline so the end point will have a zero distance.

I'm aware I can use geopy, numpy, and pyproj as few. I initially tried haversine distance but was having issues importing the python module. I'm not sure how to approach this problem using those modules, do I need a search cursor and apply that to each row? So, If I have a polyline with nodes, calculating the distances between each of those nodes. These are coordinates in real locations on earth, so not a cartesian coordinate system, if you will

  • I missunderstood your question thats why I deleted my answer. I will come up with a proper solution. – Dimitar Mar 23 '23 at 15:18
  • Could you specify if you want to know ether Dx1 = ABS(X2-X1) and Dy1 = ABS(Y2-Y1) or a distance between the points P1(X1,Y1) and P2(X2,Y2) ? – Roberto Mar 23 '23 at 15:19
  • @Roberto the geographic distance between the points P1(X1,Y1) and P2(X2,Y2). So, If I have a polyline with nodes, calculating the distances between each of those nodes. These are coordinates in real locations on earth, so not a cartesian coordinate system, if you will. – Connor Garrett Mar 23 '23 at 15:33

1 Answers1

2

In order to calculate distances between following points you can use an approach below. For the testing purposes I defined corners of a rectangle.

X = [0, 1, 1, 0, 0]
Y = [0, 0, 1, 1, 0]

df = pd.DataFrame({"X": X, "Y": Y})

df["X_lag"] = df["X"].shift(1)
df["Y_lag"] = df["Y"].shift(1)


distances = np.sqrt((df['X']-df["X_lag"])**2+(df['Y']-df["Y_lag"])**2)
print(distances)

this gives a pandas Series with the following values: [nan, 1.0, 1.0, 1.0, 1.0]

So now you can drop lag columns with df.drop(["X_lag", "Y_lag"], axis=1, inplace=True) and you get:

X  Y    distance
0  0       NaN
1  0       1.0
1  1       1.0
0  1       1.0
0  0       1.0

For a geographic distance you can import geopy.distance and apply the following code. It will interpret previous numbers as degrees.

def calc_orthodromic(row):
    try:
        return geopy.distance.geodesic(row["XY"], row["XY_lag"]).m
    except:
        return np.NaN

df['XY'] = list(zip(df["X"], df["Y"]))
df['XY_lag'] = list(zip(df["X_lag"], df["Y_lag"]))

df['distance'] = df.apply(calc_orthodromic, axis=1)

Which give distance in meters: [nan, 110574.3885578, 111302.64933943, 110574.3885578, 111319.49079327, 156899.56829134]

Roberto
  • 649
  • 1
  • 8
  • 22