I've got a tsibble
where timestamped observational data has been aggregated to 30-minute intervals. The data is in several groups, and I'd like to make sure that each 30-minute group appears in the tsibble, even when there were no observations in that time period.
Let's return to the birdwatching example from my previous question about tsibbles. Suppose I'm watching duck and geese at a certain location from 8:00 to 18:00 each day and recording, for each observation, a) the time, b) the type of bird observed, and c) the number of birds in the flock observed.
library(tidyverse) # includes lubridate
library(tsibble)
N <- 10
set.seed(42)
# suppose we're observing ducks and geese between 8:00 and 18:00.
d <- as_datetime("2023-03-08 08:00:00")
times <- d + seconds(unique(round(sort(runif(N, min = 0, max = 36e3)))))
nObs <- 1 + rpois(length(times), lambda = 1)
birdIdx <- 1 + round(runif(length(times)))
birds <- c("Duck", "Goose")[birdIdx]
# Tibble of observations
waterfowl <- tibble(Timestamp = times, Count = nObs, Bird = as_factor(birds))
# Convert to tsibble (time series tibble) and aggregate on a 30-minute basis
waterfowl |>
as_tsibble(index = Timestamp) |>
group_by(Bird) |>
index_by(Interval = floor_date(Timestamp, "30 minute")) |>
summarize(`Total birds` = sum(Count)) -> waterfowlSumm
waterfowlSumm |> print(n = Inf)
This gives
# A tsibble: 10 x 3 [30m] <UTC>
# Key: Bird [2]
Bird Interval `Total birds`
<fct> <dttm> <dbl>
1 Goose 2023-03-08 09:00:00 2
2 Goose 2023-03-08 13:00:00 4
3 Goose 2023-03-08 14:00:00 1
4 Goose 2023-03-08 15:00:00 4
5 Goose 2023-03-08 16:00:00 1
6 Goose 2023-03-08 17:00:00 2
7 Duck 2023-03-08 10:30:00 2
8 Duck 2023-03-08 14:30:00 2
9 Duck 2023-03-08 15:00:00 4
10 Duck 2023-03-08 17:00:00 2
What I'd like to do is fill missing intervals. I can use fill_gaps
for this:
> waterfowlSumm |> fill_gaps(`Total birds` = 0) |> print(n = Inf)
# A tsibble: 31 x 3 [30m] <UTC>
# Key: Bird [2]
Bird Interval `Total birds`
<fct> <dttm> <dbl>
1 Goose 2023-03-08 09:00:00 2
2 Goose 2023-03-08 09:30:00 0
3 Goose 2023-03-08 10:00:00 0
...
15 Goose 2023-03-08 16:00:00 1
16 Goose 2023-03-08 16:30:00 0
17 Goose 2023-03-08 17:00:00 2
18 Duck 2023-03-08 10:30:00 2
19 Duck 2023-03-08 11:00:00 0
20 Duck 2023-03-08 11:30:00 0
...
29 Duck 2023-03-08 16:00:00 0
30 Duck 2023-03-08 16:30:00 0
31 Duck 2023-03-08 17:00:00 2
However, since I start watching birds at 8:00 and stop at 18:00, I'd like to fill in missing intervals beyond the times where I actually observed birds. So I might do
> waterfowlSumm |> fill_gaps(`Total birds` = 0, .start = d, .end = d + hours(9) + minutes(30)) |> print(n = Inf)
# A tsibble: 40 x 3 [30m] <UTC>
# Key: Bird [2]
Bird Interval `Total birds`
<fct> <dttm> <dbl>
1 Goose 2023-03-08 08:00:00 0
2 Goose 2023-03-08 08:30:00 0
3 Goose 2023-03-08 09:00:00 2
...
18 Goose 2023-03-08 16:30:00 0
19 Goose 2023-03-08 17:00:00 2
20 Goose 2023-03-08 17:30:00 0
21 Duck 2023-03-08 08:00:00 0
22 Duck 2023-03-08 08:30:00 0
23 Duck 2023-03-08 09:00:00 0
...
38 Duck 2023-03-08 16:30:00 0
39 Duck 2023-03-08 17:00:00 2
40 Duck 2023-03-08 17:30:00 0
This works. However, now suppose that my data has additional grouping variables --- say, I'm observing birds at several sites. Of course, since I can't be in two places at the same time, each site has a different observer. And different observers have different working hours, so .start
and .end
must be set on a per-group basis.
The start/end times are available in my data, but .start
and .end
apparently can't be pulled from the tsibble being operated on:
> waterfowlSumm |> mutate(Start = d, End = d + hours(9) + minutes(30)) |> fill_gaps(`Total birds` = 0, .start = Start, .end = End)
Error in scan_gaps.tbl_ts(.data, .full = !!enquo(.full), .start = .start, :
object 'Start' not found
So my question is: how do I do this? I'd really like to be able to use grouping (in this example I only have one group to begin with, but in reality there are many) so I only have to invoke fill_gaps
once, with the correct start/end being pulled from the tsibble.
Thanks!