I already read answers and blog entries about how to iterate pandas.DataFrame efficient (https://engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6), but i still have one question left.
Currently, my DataFrame represents a GPS trajectory containing the columns time, longitude and latitude. Now, I want to calculate a feature called distance-to-next-point. Therefore, i not only have to iterate through the rows and doing operations on the single rows, but have to access subsequent rows in a single iteration.
i=0
for index, row in df.iterrows():
if i < len(df)-1:
distance = calculate_distance([row['latitude'],row['longitude']],[df.loc[i+1,'latitude'],df.loc[i+1,'longitude']])
row['distance'] = distance
Besides this problem, I have the same issue when calculating speed, applying smoothing or other similar methods.
Another example: I want to search for datapoints with speed == 0 m/s and outgoing from these points I want to add all subsequent datapoints into an array until the speed reached 10 m/s (to find segments of accelerating from 0m/s to 10m/s).
Do you have any suggestions on how to code stuff like this as efficient as possbile?