1

I have a dataframe with lat/lon coordinates which are basically gps signals. I need to calculate the distance between sequential rows to then use in a check to ensure it does't exceed a specific threshold I'm interested in.

Here is an example dataset:

library(geosphere)
library(tidyverse)

Seqlat <- seq(from = -90, to = 90, by = .01)
Seqlong <- seq(from = -180, to = 180, by = .01)
Latitude <- sample(Seqlat, size = 100, replace = TRUE)
Longitude <- sample(Seqlong, size = 100, replace = TRUE)

df <- data.frame(Latitude, Longitude)

I know I can use the geosphere::distm() function to calculate the distance between the set of coordinates. This works if I extract them individually from the dataframe:


distm(c(df$Longitude[1], df$Latitude[1]),
  c(df$Longitude[2], df$Latitude[2]),
  fun = distHaversine)

However, when I try to do this in the dataframe it doesn't work. I tried to exclude the last row from the calculation hoping that I would get a difference for all the other rows but this didn't work...

df %>% mutate(distance = ifelse(row_number() == n(), distm(
  c(Longitude, Latitude),
  c(lead(Longitude), lead(Latitude)),fun = distHaversine
), NA))

Ideally, what I would like is a distance between each consecutive pair of coordinates in a new column. The last row would not have a distance as there isn't a subsequent row from which to calculate it.

Dasr
  • 777
  • 6
  • 16

2 Answers2

2
df["distance"] <- c(NA,
                    sapply(seq.int(2,nrow(df)), function(i){
                      distm(c(df$Longitude[i-1],df$Latitude[i-1]),
                            c(df$Longitude[i], df$Latitude[i]),
                            fun = distHaversine)
                    })
)

This generates a vector beginning with NA for the first row. then it iterates until the last row while calculating the distance and adds those to the vector.

Sandwichnick
  • 1,379
  • 6
  • 13
  • Cool, Thanks! I eventually worked out it would function in dplyr if I cbind things before hand! I think it's something related to how distm is coded. – Dasr Jun 16 '21 at 11:04
0

If you restructure your dataframe a bit it would be easy to do this in dplyr pipeline.

library(dplyr)
library(geosphere)

df %>%
  mutate(across(.fns = lead, .names = '{col}_next')) %>%
  rowwise() %>%
  mutate(dist = distm(c(Longitude, Latitude),c(Longitude_next, Latitude_next),
                 fun = distHaversine)[1]) %>%
  ungroup()  %>%
  select(-ends_with('next'))

#   Latitude Longitude      dist
#      <dbl>     <dbl>     <dbl>
# 1    87.2      -24.6 11575192.
# 2   -14.7     -100.  15515546.
# 3    -9.31     113.  17566695.
# 4     3.44     -88.7  8298367.
# 5    77.4     -106.  12966075.
# 6   -32.2     -172.  10435334.
# 7   -29.4      -55.7  8368057.
# 8    36.4      -94.6 15108192.
# 9    -3.76     118.  11331809.
#10   -27.6     -137.  14668975.
# … with 90 more rows

We create two additional columns Longitude_next and Latitude_next which has the next value of each row and apply distm function in each row.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213