0

I'm stuck with a little R-problem. Let's assume I have a zoo data set that is structured as follows:

df<-data.frame(r1=rnorm(25), r2=rnorm(25))
df<-zoo(df, as.Date(seq(from=as.Date("1980-01-01"), to=as.Date("1983-02-01"), by="1 month")))

I would like to calculate the quarterly average value of both r1 and r2 per month, in the following way (in pseudo-code):

rx (Jan. 1980) = rx (Jan. 1980)
rx (Feb. 1980) = average [ rx (Jan. 1980), rx (Feb. 1980) ]
rx (Mar. 1980) = average [ rx (Jan. 1980), rx (Feb. 1980), rx (Mar. 1980)]

rx (Apr. 1980) = rx (Apr. 1980)
rx (May  1980) = average [ rx (Apr. 1980), rx (May  1980) ]
rx (Jun. 1980) = average [ rx (Apr. 1980), rx (May  1980), rx (Jun. 1980)]

etc. - that is, I would like to replace the value of each month by the value of the realized observations in the quarter up to this point in the quarter.

I've experimented with rollapply (for month 2 with parameters width=2, align = "right"; for month 3 with width=3), but I feel that either I can't figure out the smartest way to do it, or there's a better/faster way to do this... any suggestions would be highly appreciated!

Thanks, Philipp

PMaier
  • 592
  • 7
  • 19
  • Just so you know your `r1` and `r2` vectors are only 25 in length, but your dates are 38 in length, so the last 13 rows of `df` are duplicates. Was this your intention? – Nick Nov 21 '13 at 19:57
  • @Nick: My mistake. Should all be 25 obs. – PMaier Nov 21 '13 at 20:09

2 Answers2

2

Don't know if it's the best way, either. But it works! I experimented with rollapply, and finally got it working using a variable width parameter.

rollapply(df, ((month(index(df)) - 1) %% 3) + 1, mean, align="right")
celiomsj
  • 329
  • 1
  • 5
  • Genius! I had to modify month(index(df)) into as.numeric(format(index(df), "%m")) but otherwise perfect! – PMaier Nov 21 '13 at 20:22
0

Unless I'm missing something, a way is this:

fac <- as.numeric(cut(as.Date(attributes(df)$index, "%Y-%m-%d"), "quarter"))
split_df <- split(df, fac)                                                       
newdf <- do.call(rbind, lapply(split_df, 
                          function(x) {
                               x$r1 <- cumsum(x$r1) / seq_along(x$r1); 
                               x$r2 <- cumsum(x$r2) / seq_along(x$r2); 
                               return(x)}))
#newdf                                  #df
#                     r1           r2   #                    r1          r2
#1980-01-01 -0.056649139 -0.816007382   #1980-01-01 -0.05664914 -0.81600738
#1980-02-01  0.008543219 -0.423027620   #1980-02-01  0.07373558 -0.03004786
#1980-03-01  0.395468481 -0.755660995   #1980-03-01  1.16931901 -1.42092775
#1980-04-01 -0.375906206 -1.011203256   #1980-04-01 -0.37590621 -1.01120326
#1980-05-01 -0.131085288 -0.876251192   #1980-05-01  0.11373563 -0.74129913
#1980-06-01  0.025572095 -0.347781855   #1980-06-01  0.33888686  0.70915682
alexis_laz
  • 12,884
  • 4
  • 27
  • 37