I am trying to count the number of positive events over a 12 month rolling window.
I can create 365 rows of missing data per year and use zoo::rollapply
to sum the number of events per 365 rows of data, but my data frame is really big and I want to do this on a bunch of variables, so this takes forever to run.
I can get the correct output with this:
data <- data.frame(id = c("a","a","a","a","a","b","b","b","b","b"),
date = c("20-01-2011","20-04-2011","20-10-2011","20-02-2012",
"20-05-2012","20-01-2013","20-04-2013","20-10-2013",
"20-02-2014","20-05-2014"),
event = c(0,1,1,1,0,1,0,0,1,1))
library(lubridate)
library(dplyr)
library(tidyr)
library(zoo)
data %>%
group_by(id) %>%
mutate(date = dmy(date),
cumsum = cumsum(event)) %>%
complete(date = full_seq(date, period = 1), fill = list(event = 0)) %>%
mutate(event12 = rollapplyr(event, width = 365, FUN = sum, partial = TRUE)) %>%
drop_na(cumsum)
Which is this:
id date event cumsum event12
<fct> <date> <dbl> <dbl> <dbl>
a 2011-01-20 0 0 0
a 2011-04-20 1 1 1
a 2011-10-20 1 2 2
a 2012-02-20 1 3 3
a 2012-05-20 0 3 2
b 2013-01-20 1 1 1
b 2013-04-20 0 1 1
b 2013-10-20 0 1 1
b 2014-02-20 1 2 1
b 2014-05-20 1 3 2
But want to see if there's a more efficient way, as in how would I make the width in rollyapply
count up dates rather than count up rows.