3

I have a dataset with unequally spaced observations and frequently observations occur more than once per day. I'd like to apply a function to windows of my data, but I want the windows to be defined by time rather than by row. For example, I'd like to compute the mean for days 1-5, days 2-6, etc. within my dataset, and days 1-5 may correspond to rows 1-13, days 2-6 corresponds to rows 3-18, etc.

I saw that the rollapply function accepts zoo objects, and I assumed it would work as I describe above (i.e. applying the function over windows defined by time rather than windows defined by rows). However, this doesn't seem to be the case:

my.ts = zoo( 1:100, as.Date("201401","%Y%j")+1:100 )
mean1 = rollapply( my.ts, 3, mean, align="right" )
my.ts = zoo( 1:100, as.Date("201401","%Y%j")+1:100/2 )
mean2 = rollapply( my.ts, 3, mean, align="right" )
all( mean1==mean2 )

I'd expect mean2 to be different from mean1 since mean2 has two observations per day instead of one. However, it appears that rollapply uses rows to define the windows rather than the times from the zoo object. Is there a work-around for this? Or, possibly some other function I should be using in place of rollapply?

random_forest_fanatic
  • 1,232
  • 1
  • 12
  • 30

1 Answers1

6

rollapply is documented in ?rollapply so there is no need to guess how it works.

To do what you want fill in the missing days with NAs and then perform the mean. For example, to do a mean for every three days rather than every three observations:

library(zoo)

# test data
tt <- as.Date("2000-01-01") + c(1, 2, 5, 6, 7, 8, 10)
z <- zoo(seq_along(tt), tt)

# fill it out to a daily series, zm, using NAs
g <- zoo(, seq(start(z), end(z), "day")) # zero width zoo series on a grid
zm <- merge(z, g)

rollapply(zm, 3, mean, na.rm = TRUE, fill = NA)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Thanks for the reply! I think you may be misunderstanding my problem though... I have multiple observations per day, and sometimes differing amounts. I want rollapply to compute means for every three days and use all the observations that occur within those three days. – random_forest_fanatic Jul 17 '14 at 16:27
  • If your data is hourly then create a regular hourly series and take the same approach. If its by the minute make a regular minute by minute series, etc. Please provide representative data in the future to better communicate what you want. – G. Grothendieck Jul 17 '14 at 16:31
  • Ah, I see, ok. So, if I have observations irregularly spaced throughout a day, I need to break the day into chunks and add in NA's for chunks without observations. Thanks! – random_forest_fanatic Jul 17 '14 at 16:37