13

I would like to get rolling average for each of the numeric variables that I have. Using data.table package, I know how to compute for a single variable. But how should I revise the code so it can process multiple variables at a time rather than revising the variable name and repeat this procedure for several times? Thanks.

Suppose I have other numeric variables named as "V2", "V3", and "V4".

require(data.table)
setDT(data)
setkey(data,Receptor,date)
data[ , `:=` ('RollConc' = rollmean(AvgConc, 48, align="left", na.pad=TRUE)) , by=Receptor]

A copy of my sample data can be found at: https://drive.google.com/file/d/0B86_a8ltyoL3OE9KTUstYmRRbFk/view?usp=sharing

I would like to get 5-hour rolling means for "AvgConc","TotDep","DryDep", and "WetDep" by each receptor.

Vicki1227
  • 151
  • 1
  • 3
  • 10

2 Answers2

20

From your description you want something like this, which is similar to one example that can be found in one of the data.table vignettes:

library(data.table)
set.seed(42)
DT <- data.table(x = rnorm(10), y = rlnorm(10), z = runif(10), g = c("a", "b"), key = "g")
library(zoo)
DT[, paste0("ravg_", c("x", "y")) := lapply(.SD, rollmean, k = 3, na.pad = TRUE), 
   by = g, .SDcols = c("x", "y")]
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
Roland
  • 127,288
  • 10
  • 191
  • 288
  • A further question... I see the code computed my data but results are not saved as variables... How can I save those rolling averages into a new dataframe as variables? – Vicki1227 Jul 17 '15 at 18:56
  • I don't understand. They are saved as columns of the data.table. Why do you want them in a different data.table? If you must, you can always subset the data.table. – Roland Jul 17 '15 at 18:58
  • can be improved in future once [data.table#626](https://github.com/Rdatatable/data.table/issues/626) will be implemented – jangorecki Jul 17 '15 at 19:23
  • Is there anyway to add a numeric variable "Event" to mark each rolling mean calculation for each receptor? For example, for Receptor 1, the first rolling mean will be marked as Event[1], and the last rolling mean calculated will be Event[n]? And for Receptor 2, similarly, the rolling means will also be marked as Event j in 1 to length[rollingmean] – Vicki1227 Jul 17 '15 at 20:52
13

Now, one can use the frollmean function in the data.table package for this.

library(data.table)    
xy <- c("x", "y")
DT[, (xy):= lapply(.SD, frollmean, n = 3, fill = NA, align="center"), 
                                   by = g, .SDcols =  xy]

Here, I am replacing the x and y columns by the rolling average.


# Data
set.seed(42)
DT <- data.table(x = rnorm(10), y = rlnorm(10), z = runif(10), 
                                g = c("a", "b"), key = "g")
kangaroo_cliff
  • 6,067
  • 3
  • 29
  • 42
  • 1
    this answer is suboptimial, there is not need for lapply, just use frollmean(.SD, ...) directly. it is vectorized and will be much faster – jangorecki Oct 22 '22 at 20:03