I'm working with an ecological dataset that has multiple individuals moving across a landscape where they can be detected at multiple sites. The data has a beginning and ending timestamp when an individual was detected at a given site; heron we'll call this time window for an individual at a site an "event". These events are the rows in this data. I sorted this data by time, and noticed I can have multiple events while an individual remains at a given site (which can be due to an individual moving away from the receiver and coming back to it while not being detected at an adjacent receiver).
Here's example data for a single individual, x:
input <- data.frame(individual = c("x","x","x","x","x","x","x"),
site = c("a","a","a","b","b","a", "a"),
start_time = as.POSIXct(c("2020-01-14 11:11:11", "2020-01-14 11:13:10", "2020-01-14 11:16:20",
"2020-02-14 11:11:11", "2020-02-14 11:13:10",
"2020-03-14 11:12:11", "2020-03-15 11:12:11")),
end_time = as.POSIXct(c("2020-01-14 11:11:41", "2020-01-14 11:13:27", "2020-01-14 11:16:50",
"2020-02-14 11:13:11", "2020-02-14 11:15:10",
"2020-03-14 11:20:11", "2020-03-15 11:20:11")))
I want to aggregate these smaller events (e.g. the first 3 events at site a) into one larger event where I summarize the start/end times for the whole event:
output <- data.frame(individual = c("x","x","x"), site = c("a", "b", "a"),
start_time = as.POSIXct(c("2020-01-14 11:11:11", "2020-02-14 11:11:11", "2020-03-14 11:12:11")),
end_time = as.POSIXct(c("2020-01-14 11:16:50", "2020-02-14 11:15:10", "2020-03-15 11:20:11")))
Note that time intervals for events vary.
Using group_by(individual, site)
would mean losing this temporal info, since individuals can travel among sites multiple times. I thought about using some sort of helper dataframe that summarizes events for individuals at sites but I am not sure how to retain the temporal info. I suppose there is a way to do this by indexing row numbers/looping in base but I am hoping there is a nifty dplyr trick that can help with this problem.