I am trying to apply the following function to a Pandas dataframe:
def eukarney(lat1, lon1, alt1, lat2, lon2, alt2):
p1 = (lat1, lon1)
p2 = (lat2, lon2)
karney = distance.distance(p1, p2).m
return np.sqrt(karney**2 + (alt2 - alt1)**2)
This works if I use discrete values such as for instance:
distance = eukarney(49.907611, 5.890404, 339.15734, 49.907683, 5.890373, 339.18224)
However, if I try to apply the function to a Pandas dataframe:
df['distances'] = eukarney(df['latitude'], df['longitude'], df['altitude'], df['latitude'].shift(), df['longitude'].shift(), df['altitude'].shift())
Which means taking values from a row and the previous one.
I receive the following error message:
Traceback (most recent call last): File "/home/mirix/Desktop/plage/GPX_invert_sense_change_starting_point_va.py", line 78, in df['distances'] = eukarney(df.loc[:,'latitude':], df.loc[:,'longitude':], df.loc[:,'altitude':], df.loc[:,'latitude':].shift(), df.loc[:,'longitude':].shift(), df.loc[:,'altitude':].shift()) File "/home/mirix/Desktop/plage/GPX_invert_sense_change_starting_point_va.py", line 75, in eukarney karney = distance.distance(p1, p2).m File "/home/mirix/.local/lib/python3.9/site-packages/geopy/distance.py", line 522, in init super().init(*args, **kwargs) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/distance.py", line 276, in init kilometers += self.measure(a, b) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/distance.py", line 538, in measure a, b = Point(a), Point(b) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/point.py", line 175, in new return cls.from_sequence(seq) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/point.py", line 472, in from_sequence return cls(*args) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/point.py", line 188, in new _normalize_coordinates(latitude, longitude, altitude) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/point.py", line 57, in _normalize_coordinates latitude = float(latitude or 0.0) File "/home/mirix/.local/lib/python3.9/site-packages/pandas/core/generic.py", line 1534, in nonzero raise ValueError( ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Intriguingly, the same syntax works for other functions not using the geopy library.
Any ideas?
SOLUTION
There seems to be an intrinsic limitation with GeoPy's distance function which seems to only accept scalars.
The following workaround is based upon @SeaBen answer bellow:
df['lat_shift'] = df['latitude'].shift().fillna(df['latitude'])
df['lon_shift'] = df['longitude'].shift().fillna(df['longitude'])
df['alt_shift'] = df['altitude'].shift().fillna(df['altitude'])
df['distances'] = df.apply(lambda x: eukarney(x['latitude'], x['longitude'], x['altitude'], x['lat_shift'], x['lon_shift'], x['alt_shift']), axis=1).fillna(0)