apply function to rolling window in panel data in R

Question

I'm trying to apply a function (say standard deviation) in a rolling window, by category:

I have the following data:

cat = c("A", "A", "A", "A", "B", "B", "B", "B") 
year = c(1990, 1991, 1992, 1993, 1990, 1991, 1992, 1993) 
value = c(2, 3, 5, 6, 8, 9, 4, 5) 
df = data.frame(cat, year, value)

I would like to create a new column (say sd) that estimates the standard deviation over two year window by cat.

Here's the result I'm thinking of:

enter image description here

Any advice on how to achieve this?

score 2 · Accepted Answer · edited May 23 '17 at 12:11

It can be done by using rollapply from the zoo package:

library(zoo)

cat = c("A", "A", "A", "A", "B", "B", "B", "B") 
year = c(1990, 1991, 1992, 1993, 1990, 1991, 1992, 1993) 
value = c(2, 3, 5, 6, 8, 9, 4, 5) 
df = data.frame(cat, year, value)

df$stdev <- unlist(by(df, df$cat, function(x) {
  c(NA, rollapply(x$value, width=2, sd))
}), use.names=FALSE)

print(df)
##   cat year value     stdev
## 1   A 1990     2        NA
## 2   A 1991     3 0.7071068
## 3   A 1992     5 1.4142136
## 4   A 1993     6 0.7071068
## 5   B 1990     8        NA
## 6   B 1991     9 0.7071068
## 7   B 1992     4 3.5355339
## 8   B 1993     5 0.7071068

You can also do it with ddply if you'd rather use plyr functions than by:

df$stdev <- ddply(df, .(cat), summarise, 
                  stdev=c(NA, rollapply(value, width=2, sd)))$stdev

As a lark, I did a system.time (multiple times) comparison of the above two methods and also the ave method pointed out by @thelatemail in the comment thread below this answer (starting with a "fresh" copy of the data frame).

df <- data.frame(cat, year, value)
system.time(df$stdev <- with(df, ave(value, cat, FUN=function(x) c(NA, rollapply(x, width=2, sd)))))

df <- data.frame(cat, year, value)
system.time(df$stdev <- unlist(by(df, df$cat, function(x) c(NA, rollapply(x$value, width=2, sd))), use.names=FALSE))

df <- data.frame(cat, year, value)
system.time(df$stdev <- ddply(df, .(cat), summarise, stdev=c(NA, rollapply(value, width=2, sd)))$stdev)

Both the ave and by methods take:

   user  system elapsed 
  0.002   0.000   0.002

and the ddply version takes:

   user  system elapsed 
  0.004   0.000   0.004

Not that speed is really an issue here, but it looks like the ave and by versions are the most efficient ways to do this.

thank you, but it's not quite right - i want those standard deviations to be grouped by "cat". can i use ddply or something similar here? — ec0n0micus, Apr 01 '14 at 02:00
No need for `unlist(by(...))` - `ave` is probably more appropriate: `with(df, ave(value, cat, FUN=function(x) c(NA, rollapply(x, width=2, sd)) ))` — thelatemail, Apr 01 '14 at 02:40
Nice. Didn't think of `ave`. It does `interaction(...); split(x, g) <- lapply(split(x, g), FUN)` under the covers so it does alot under the covers. — hrbrmstr, Apr 01 '14 at 02:46
A slight simplification could be had by using `rollapplyr` with the `fill=NA` argument: `ave(df$value, df$cat, FUN = function(v) rollapplyr(v, 2, sd, fill = NA))` — G. Grothendieck, Apr 01 '14 at 03:23
[@G](http://stackoverflow.com/users/516548/g-grothendieck) Now I want to go back and add that to the time tests! That's prbly the best all around solution. — hrbrmstr, Apr 01 '14 at 03:33

apply function to rolling window in panel data in R

1 Answers1