I am trying to calculate a moving variance with a window of, let's say 4 years, for each one of the names
A
, B
and C
. The data is weekly:
> head(data1, 17)
date name value
1 1985-01-01 A -0.44008233
2 1985-01-01 B NA #Observe that there are some NA's
3 1985-01-01 C 0.38682496
4 1985-01-08 A 0.41806540
5 1985-01-08 B -0.05460831
6 1985-01-08 C -0.52051435
7 1985-01-15 A 1.25769395
8 1985-01-15 B 0.80272053
9 1985-01-15 C -0.34501742
10 1985-01-22 A -0.43401839
11 1985-01-22 B 0.91113966
12 1985-01-22 C 1.07131717
13 1985-01-29 A -1.55395857
14 1985-01-29 B -0.43281709
15 1985-01-29 C 0.98034779
16 1985-02-05 A 1.70557396
17 1985-02-05 B 0.44688788
My approach so far is to dcast
the data and then run rollapply()
(zoo
) column wise with a moving window of 192 = 4*12*4:
v <- dcast(data1, date ~ name, value.var = "value")
var <- rollapply(v[-1], width=4*12*4, var, fill=NA, by.column = T)
var <- cbind(v$date, var)
var[,1] <- as.Date(var[,1])
However, I realized that for some months I have four observations (eg. 7, 14, 21, 28 February) and for some I have five weekly observations (eg.1, 8, 15, 22 and 29 of January), so using a window of 4 years * 12 months * 4 weeks
observations is not correct. I was thinking to add these extra observations on the time window (width
), but I am not sure how (or if it is even possible), since these change depending on how many 5-weeks-per-month and how many 4-weeks-per-month observations there are inside the time window.
Additionally, I would like to have a NA
when there are NA
observations within the moving time window (this is handled automatically by var()
anyway I think) and also I would like to ignore zero observations. For that I was thinking I could remove the zeroes before running the running variance function and then somehow put them back in in the end. So you can ignore that part, unless, of course, you have some good idea to do it in one step.
Sample data:
set.seed(486)
date <- rep(seq(as.Date("1985-01-01"), as.Date("2010-01-1"), by="weeks"), each=3)
N <- length(date)
name <- c("A","B","C")
value <- rnorm(N)
i<-which(value %in% sample(value, 25)) ;i
j<-which(value %in% sample(value, 150)) ;j
value[i] <- NA
value[j] <- 0
data1 <- data.frame(date, name, value)