It can be done by using rollapply
from the zoo
package:
library(zoo)
cat = c("A", "A", "A", "A", "B", "B", "B", "B")
year = c(1990, 1991, 1992, 1993, 1990, 1991, 1992, 1993)
value = c(2, 3, 5, 6, 8, 9, 4, 5)
df = data.frame(cat, year, value)
df$stdev <- unlist(by(df, df$cat, function(x) {
c(NA, rollapply(x$value, width=2, sd))
}), use.names=FALSE)
print(df)
## cat year value stdev
## 1 A 1990 2 NA
## 2 A 1991 3 0.7071068
## 3 A 1992 5 1.4142136
## 4 A 1993 6 0.7071068
## 5 B 1990 8 NA
## 6 B 1991 9 0.7071068
## 7 B 1992 4 3.5355339
## 8 B 1993 5 0.7071068
You can also do it with ddply
if you'd rather use plyr
functions than by
:
df$stdev <- ddply(df, .(cat), summarise,
stdev=c(NA, rollapply(value, width=2, sd)))$stdev
As a lark, I did a system.time
(multiple times) comparison of the above two methods and also the ave
method pointed out by @thelatemail in the comment thread below this answer (starting with a "fresh" copy of the data frame).
df <- data.frame(cat, year, value)
system.time(df$stdev <- with(df, ave(value, cat, FUN=function(x) c(NA, rollapply(x, width=2, sd)))))
df <- data.frame(cat, year, value)
system.time(df$stdev <- unlist(by(df, df$cat, function(x) c(NA, rollapply(x$value, width=2, sd))), use.names=FALSE))
df <- data.frame(cat, year, value)
system.time(df$stdev <- ddply(df, .(cat), summarise, stdev=c(NA, rollapply(value, width=2, sd)))$stdev)
Both the ave
and by
methods take:
user system elapsed
0.002 0.000 0.002
and the ddply
version takes:
user system elapsed
0.004 0.000 0.004
Not that speed is really an issue here, but it looks like the ave
and by
versions are the most efficient ways to do this.