Calculate distance between consecutive rows, by group

Question

Morning, afternoon, evening

I have the following boat data:

set.seed(123)

df <- data.frame(
  fac = as.factor(c("A", "A", "A", "A",
                    "B", "B", "B",
                    "C", "C", "C", "C", "C")),
  lat = runif(12, min = 45, max = 47),
  lon = runif(12, min = -6, max = -5 ))

I group the data by the factor variable fac.

library(dplyr)

df_grouped <- df %>% 
  group_by(fac) %>% 
  summarise(first_lon = first(lon),
            last_lon  = last(lon),
            first_lat = first(lat),
            last_lat  = last(lat))

I use the first and last latitudes (lat) and longitudes (lon) to create polygons

I also use the first and last latitudes (lat) and longitudes (lon) to estimate distance across the polygon.

library(geosphere)

df_grouped %>% 
  mutate(distance_m = distHaversine(matrix(c(first_lon, first_lat), ncol = 2),
                                    matrix(c(last_lon, last_lat),   ncol = 2)))

Although this assumes the boat goes in a straight line across the longest possible distance within the polygon.

This is not always true, sometimes it wiggles about a bit:

.

What I would like to do is actual distance the boat has traveled by working out the distance between each row with a group.

Or in other words:

For example for fac == "C", the boat will have traveled x meters, where x is calculated from the distance between each data point within the grouping.

Waldi · Accepted Answer · 2020-08-17T17:03:37.700

2

Try :

df %>%  group_by(fac) %>%
  mutate(lat_prev = lag(lat,1), lon_prev = lag(lon,1) ) %>%
   mutate(dist = distHaversine(matrix(c(lon_prev, lat_prev), ncol = 2),
                matrix(c(lon, lat),   ncol = 2))) %>%
  summarize(dist = sum(dist,na.rm=T))

# A tibble: 3 x 2
  fac      dist
  <fct>   <dbl>
1 A      93708.
2 B     219742.
3 C     347578.

Much better, as suggested by Henrik:

df %>%  group_by(fac) %>%
        summarize(dist = distHaversine(cbind(lon, lat))) %>%
        summarize(dist = sum(dist,na.rm=T))

edited Aug 17 '20 at 17:03

answered Aug 17 '20 at 16:13

Waldi

39,242
6
30
78

Do you need to lag? On the `p2` argument: "[If] missing [...] sequential distance between the points in `p1` is computed". – Henrik Aug 17 '20 at 16:54
Thanks for the answer. It seems to work for me. Not sure if I understand @Henriks comment – Jim Aug 17 '20 at 17:01
I was refering to the help text of `distHaversine`. The function takes arguments `p1` and `p2`. If you don't provide `p2`, it will calculate distance between consecutive points in `p1`. – Henrik Aug 17 '20 at 17:04
@Henrik, thanks for the hint, see my edit – Waldi Aug 17 '20 at 17:04
Great thank you, would you mind explain the lag function – Jim Aug 17 '20 at 17:04
The lag function takes the values of the row before : try `df %>% mutate(lat_prev = lag(lat,1), lon_prev = lag(lon,1) )`to better understand what it does – Waldi Aug 17 '20 at 17:07
Ah great, I see! Thanks for the tips. – Jim Aug 17 '20 at 17:21
@Waldi In your second chunk of code, should it be `mutate(dist = distHaversine(cbind(lon, lat)))` (i.e. not `summarize`, similar to the first chunk)? – Henrik Aug 17 '20 at 17:36
On implementing the code, it only works with @Waldi original answer just out of interest. Although in the real data I `collapse_by` an hour from the `tibbletime` package.... – Jim Aug 18 '20 at 07:02

score 0 · Answer 2 · answered Aug 17 '20 at 16:14

The dplyr::lag will pull the value from the previous row. You can then pass those values to a second mutate step to perform distance calculations (these probably aren't the specific calculations you want, but it illustrates the general technique):

library(dplyr)

df %>% 
  group_by(fac) %>% 
  mutate(lag_lat = lag(lat), lag_lon = lag(lon)) %>% 
  mutate(dist_lat = lat - lag_lat, dist_lon = lon - lag_lon)

Note that lag is sensitive to the order of the rows. Be sure that they are in temporal order.

Hi, thanks for you answer. This chucks out some pretty funky results for me...? — Jim, Aug 17 '20 at 16:59

Calculate distance between consecutive rows, by group

2 Answers2