Distance between coordinates in dataframe sequentially?

Question

I have a dataframe with lat/lon coordinates which are basically gps signals. I need to calculate the distance between sequential rows to then use in a check to ensure it does't exceed a specific threshold I'm interested in.

Here is an example dataset:

library(geosphere)
library(tidyverse)

Seqlat <- seq(from = -90, to = 90, by = .01)
Seqlong <- seq(from = -180, to = 180, by = .01)
Latitude <- sample(Seqlat, size = 100, replace = TRUE)
Longitude <- sample(Seqlong, size = 100, replace = TRUE)

df <- data.frame(Latitude, Longitude)

I know I can use the geosphere::distm() function to calculate the distance between the set of coordinates. This works if I extract them individually from the dataframe:


distm(c(df$Longitude[1], df$Latitude[1]),
  c(df$Longitude[2], df$Latitude[2]),
  fun = distHaversine)

However, when I try to do this in the dataframe it doesn't work. I tried to exclude the last row from the calculation hoping that I would get a difference for all the other rows but this didn't work...

df %>% mutate(distance = ifelse(row_number() == n(), distm(
  c(Longitude, Latitude),
  c(lead(Longitude), lead(Latitude)),fun = distHaversine
), NA))

Ideally, what I would like is a distance between each consecutive pair of coordinates in a new column. The last row would not have a distance as there isn't a subsequent row from which to calculate it.

score 2 · Accepted Answer · answered Jun 16 '21 at 10:50

2

df["distance"] <- c(NA,
                    sapply(seq.int(2,nrow(df)), function(i){
                      distm(c(df$Longitude[i-1],df$Latitude[i-1]),
                            c(df$Longitude[i], df$Latitude[i]),
                            fun = distHaversine)
                    })
)

This generates a vector beginning with NA for the first row. then it iterates until the last row while calculating the distance and adds those to the vector.

answered Jun 16 '21 at 10:50

Sandwichnick

1,379
6
13

Cool, Thanks! I eventually worked out it would function in dplyr if I cbind things before hand! I think it's something related to how distm is coded. – Dasr Jun 16 '21 at 11:04

score 0 · Answer 2 · answered Jun 16 '21 at 12:57

If you restructure your dataframe a bit it would be easy to do this in dplyr pipeline.

library(dplyr)
library(geosphere)

df %>%
  mutate(across(.fns = lead, .names = '{col}_next')) %>%
  rowwise() %>%
  mutate(dist = distm(c(Longitude, Latitude),c(Longitude_next, Latitude_next),
                 fun = distHaversine)[1]) %>%
  ungroup()  %>%
  select(-ends_with('next'))

#   Latitude Longitude      dist
#      <dbl>     <dbl>     <dbl>
# 1    87.2      -24.6 11575192.
# 2   -14.7     -100.  15515546.
# 3    -9.31     113.  17566695.
# 4     3.44     -88.7  8298367.
# 5    77.4     -106.  12966075.
# 6   -32.2     -172.  10435334.
# 7   -29.4      -55.7  8368057.
# 8    36.4      -94.6 15108192.
# 9    -3.76     118.  11331809.
#10   -27.6     -137.  14668975.
# … with 90 more rows

We create two additional columns Longitude_next and Latitude_next which has the next value of each row and apply distm function in each row.

Cool! Thanks Ronak for alternative. – Dasr Jun 23 '21 at 10:49 — Dasr, Jun 23 '21 at 10:49

Distance between coordinates in dataframe sequentially?

2 Answers2