0

I want to calculate a rolling mean (backward and forward) over 15 days each. Here is a testframe:

date_list = seq(ymd('2000-01-15'),ymd('2010-09-18'),by='day')
testframe = data.frame(Date = date_list)
testframe$Day = substr(testframe$Date, start = 6, stop = 10)
testframe$V1 = runif(3900, 2.0, 35.0)
testframe$V2 = runif(3900, 5.0, 40.0)
testframe$V3 = runif(3900, -10.0, 10.0)
testframe$V4 = seq(from = 5, to = 45, length.out = 3900)

I know how to calculate it for each individual column:

library(zoo)
rollmean(testframe$V4, 31)
rollapply(testframe$V4, 31, mean)

But how can I do this for each column at once? I think I have to exclude the Day and Date column for that, but how can I do that within the command? And how can I get the results in my old testframe with NAs for the first and last 15 days?

I tried this:

testframe[paste0("new_col",1:4)] <- lapply(testframe[,3:6], rollapply, FUN = mean, width = 31)

But it doesnt work!

Mr.Spock
  • 511
  • 2
  • 13
  • Please ensure posted code is reproducible which includes providing all library statements and also using `set.seed` when generating random numbers. Also please make code minimal. Here ymd seems unnecessary since as.Date would have worked as well and eliminates a package. – G. Grothendieck Jun 12 '19 at 14:50

2 Answers2

2

The default operation of rollmean and rollapply is to act on every column. Please review ?rollapply .

library(zoo)
rollmeanr(BOD, 2, fill = NA)

giving the following in which rollmean is applied to each column of the builtin BOD:

     Time demand
[1,]   NA     NA
[2,]  1.5   9.30
[3,]  2.5  14.65
[4,]  3.5  17.50
[5,]  4.5  15.80
[6,]  6.0  17.70

If you only want to apply the mean to some columns then specify that:

if (exists("BOD", .GlobalEnv)) rm(BOD)
BOD[1:2] <- rollmeanr(BOD[1:2], 2, fill = NA)

Note that it if you have all numeric columns except for an index column it would be easier to just use zoo objects rather than try to force fit everything into data.frames which don't work that well with time series.

if (exists("BOD", .GlobalEnv)) rm(BOD)
z <- read.zoo(BOD)
rollmeanr(z, 2)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
1

While @G.Grothendieck's answer is better in many ways, here is some context for what might be going wrong in your case:

testframe[paste0("new_col",1:4)] <- lapply(testframe[,3:6], rollapply, FUN = mean, width = 31)
# Error in mean.default(X[[i]], ...) : 'trim' must be numeric of length one

This is partly because you are passing FUN=, but that is also the name of the argument to lapply, so it is being used there instead, effectively:

testframe[paste0("new_col",1:4)] <- lapply(testframe[,3:6], function(a) mean(a, trim=rollapply, width = 31))

The second argument to mean is trim=, which in this case is being passed the function rollapply, obviously not right.

The next step would be

testframe[paste0("new_col",1:4)] <- lapply(testframe[,3:6], function(a) rollapply(a, FUN = mean, width = 31))
# Error in `[<-.data.frame`(`*tmp*`, paste0("new_col", 1:4), value = list( : 
#   replacement element 1 has 3870 rows, need 3900

which is because a single rollapply is not returning the first/last 15 values (resulting in 30 fewer observations). You can fix this with fill=NA:

testframe[paste0("new_col",1:4)] <- lapply(testframe[,3:6], function(a) rollapply(a, FUN = mean, width = 31, fill = NA))
# (no warnings/errors)
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • The `FUN=` dual-argument issue was something I normally wouldn't think of, but the error message was the hint that started me down that road. (I suggest you include the error message in your questions instead of less-clear *"But it doesnt work"*.) – r2evans Jun 12 '19 at 14:27