0

I have a long dataset in the following format:

Date         Country     Score
1995-01-01   Australia    100
1995-01-02   Australia     99
1995-01-03   Australia     85
:            :             :
:            :             :
2019-06-30   Australia     57
1995-01-01   Austria       67
1995-01-02   Austria       12
1995-01-03   Austria       10
:            :             :
:            :             :
2019-06-30   Austria       21  

I want to calculate a 90-day period rolling standard deviation of the Score for each country. I have tried using the rollapply function (Package:zoo) and roll_sd (Package:RcppRoll) but they are not working for groupwise standard deviation. Can anyone please suggest a possible way to calculate the rolling standard deviation.

Thanks!

Richard Telford
  • 9,558
  • 6
  • 38
  • 51
Arsh
  • 11
  • 1
    Checkout [runner](https://gogonzo.github.io/runner/articles/the_runner_package.html#any-r-function-with-runner) package and apply any function on rolling windows. Put date into `idx` argument and specify `k = 90` (90-days). Even if you have a gaps in dates it still computer on 90-days instead of 90-elements - More examples are in [other vignette](https://gogonzo.github.io/runner/articles/runner_examples.html) – GoGonzo Nov 25 '19 at 14:34

1 Answers1

0

In general grouping is done separately from the base operation in R so it is not that those functions can't be used for grouped data. It is just that you need to embed them within a grouping operation. Here we use ave to do the grouping and rollapplyr to perform the rolling sd.

Now, at each point can we assume that the last 90 days are the last 90 rows? Assuming yes and taking rolling standard deviations of 2 so that we can use the selected rows of the posted data shown reproducibly in the Note at the end:

library(zoo)

roll <- function(x) rollapplyr(x, 2, sd, fill = NA)
transform(DF, roll = ave(Score, Country, FUN = roll))

giving:

        Date   Country Score       roll
1 1995-01-01 Australia   100         NA
2 1995-01-02 Australia    99  0.7071068
3 1995-01-03 Australia    85  9.8994949
4 1995-01-01   Austria    67         NA
5 1995-01-02   Austria    12 38.8908730
6 1995-01-03   Austria    10  1.4142136

Wide form approach

Another approach is to convert the data to wide form and then perform the rolling operation:

library(zoo)
z <- read.zoo(DF, split = "Country")
zr <- rollapplyr(z, 2, sd, fill = NA)
zr

giving this zoo series:

           Australia   Austria
1995-01-01        NA        NA
1995-01-02 0.7071068 38.890873
1995-01-03 9.8994949  1.414214

You can then just leave it as a zoo series in order to take advantage of the other time series functions in that package or can convert it back to a data frame using fortify.zoo(zr) or fortify.zoo(zr, melt = TRUE, names = names(DF)) depending on what you need.

Note

The input used in reproducible form.

Lines <- "Date         Country     Score
1995-01-01   Australia    100
1995-01-02   Australia     99
1995-01-03   Australia     85
1995-01-01   Austria       67
1995-01-02   Austria       12
1995-01-03   Austria       10"
DF <- read.table(text = Lines, header = TRUE)
DF$Date <- as.Date(DF$Date)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341