2

Morning, afternoon, evening

I have the following boat data:

set.seed(123)

df <- data.frame(
  fac = as.factor(c("A", "A", "A", "A",
                    "B", "B", "B",
                    "C", "C", "C", "C", "C")),
  lat = runif(12, min = 45, max = 47),
  lon = runif(12, min = -6, max = -5 ))

I group the data by the factor variable fac.

library(dplyr)

df_grouped <- df %>% 
  group_by(fac) %>% 
  summarise(first_lon = first(lon),
            last_lon  = last(lon),
            first_lat = first(lat),
            last_lat  = last(lat))

I use the first and last latitudes (lat) and longitudes (lon) to create polygons

I also use the first and last latitudes (lat) and longitudes (lon) to estimate distance across the polygon.

library(geosphere)

df_grouped %>% 
  mutate(distance_m = distHaversine(matrix(c(first_lon, first_lat), ncol = 2),
                                    matrix(c(last_lon, last_lat),   ncol = 2)))

Although this assumes the boat goes in a straight line across the longest possible distance within the polygon.

This is not always true, sometimes it wiggles about a bit:

.

What I would like to do is actual distance the boat has traveled by working out the distance between each row with a group.

Or in other words:

For example for fac == "C", the boat will have traveled x meters, where x is calculated from the distance between each data point within the grouping.

Henrik
  • 65,555
  • 14
  • 143
  • 159
Jim
  • 558
  • 4
  • 13

2 Answers2

2

Try :

df %>%  group_by(fac) %>%
  mutate(lat_prev = lag(lat,1), lon_prev = lag(lon,1) ) %>%
   mutate(dist = distHaversine(matrix(c(lon_prev, lat_prev), ncol = 2),
                matrix(c(lon, lat),   ncol = 2))) %>%
  summarize(dist = sum(dist,na.rm=T))

# A tibble: 3 x 2
  fac      dist
  <fct>   <dbl>
1 A      93708.
2 B     219742.
3 C     347578.

Much better, as suggested by Henrik:

df %>%  group_by(fac) %>%
        summarize(dist = distHaversine(cbind(lon, lat))) %>%
        summarize(dist = sum(dist,na.rm=T))
Waldi
  • 39,242
  • 6
  • 30
  • 78
  • Do you need to lag? On the `p2` argument: "[If] missing [...] sequential distance between the points in `p1` is computed". – Henrik Aug 17 '20 at 16:54
  • Thanks for the answer. It seems to work for me. Not sure if I understand @Henriks comment – Jim Aug 17 '20 at 17:01
  • I was refering to the help text of `distHaversine`. The function takes arguments `p1` and `p2`. If you don't provide `p2`, it will calculate distance between consecutive points in `p1`. – Henrik Aug 17 '20 at 17:04
  • @Henrik, thanks for the hint, see my edit – Waldi Aug 17 '20 at 17:04
  • Great thank you, would you mind explain the lag function – Jim Aug 17 '20 at 17:04
  • The lag function takes the values of the row before : try `df %>% mutate(lat_prev = lag(lat,1), lon_prev = lag(lon,1) )`to better understand what it does – Waldi Aug 17 '20 at 17:07
  • Ah great, I see! Thanks for the tips. – Jim Aug 17 '20 at 17:21
  • @Waldi In your second chunk of code, should it be `mutate(dist = distHaversine(cbind(lon, lat)))` (i.e. not `summarize`, similar to the first chunk)? – Henrik Aug 17 '20 at 17:36
  • On implementing the code, it only works with @Waldi original answer just out of interest. Although in the real data I `collapse_by` an hour from the `tibbletime` package.... – Jim Aug 18 '20 at 07:02
0

The dplyr::lag will pull the value from the previous row. You can then pass those values to a second mutate step to perform distance calculations (these probably aren't the specific calculations you want, but it illustrates the general technique):

library(dplyr)

df %>% 
  group_by(fac) %>% 
  mutate(lag_lat = lag(lat), lag_lon = lag(lon)) %>% 
  mutate(dist_lat = lat - lag_lat, dist_lon = lon - lag_lon)

Note that lag is sensitive to the order of the rows. Be sure that they are in temporal order.

davy
  • 33
  • 1
  • 1
  • 8
  • Hi, thanks for you answer. This chucks out some pretty funky results for me...? – Jim Aug 17 '20 at 16:59