I have a large dataset (> 9 million rows) of times and locations when individual animals were detected at stations. I would like to calculate the distance between each station along each animal's path as it travelled between stations, as well as the time it took to travel between stations. And then I would like to summarize the total distance and time across all sections of the path.
For each individual in this dataset, the data is organized with each time it was detected at a stationary points. If the individual was at the stationary point for a long, consecutive period of time, then there are multiple records (each ~30 s apart) for this time period.
I can summarize the data below to get 1 row for each time an individual was at a station (see below). However, the output doesn't recognize when an individual travels to the same station more than once.
E.g.
id <- c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B")
site <- c("a", "a", "b", "a", "c", "c", "c", "d", "a", "b")
time <- seq(1:10)
lat <- c(1, 1, 2, 1, 3, 3, 3, 4, 1, 2)
lon <- c(1, 1, 2, 1, 3, 3, 3, 4, 1, 2)
df <- data.frame(id, site, time, lat, lon)
df %>% group_by(id, site, lat, lon) %>%
summarize(timeStart = min(time),
timeEnd = max(time))
# A tibble: 6 x 6
# Groups: id, site, lat [?]
id site lat lon timeStart timeEnd
<fct> <fct> <dbl> <dbl> <dbl> <dbl>
1 A a 1 1 1 4
2 A b 2 2 3 3
3 A c 3 3 5 7
4 A d 4 4 8 8
5 B a 1 1 9 9
6 B b 2 2 10 10
I an approach to group the data so that the multiple visits to the same station (with trips to other stations in between) are recognized as a separate "leg" of the trip.
Then, I need to calculate the great circle distance between each station, as well as the time difference in time between timeEnd (1st station) and timeStart (2nd station).