0

I'm practicing time-series analysis on the Red Sox seasons dataset. I need to split the dataset year by year and do some calculation, so I'm pretty sure I need to use the split, lapply, rbind paradigm. I'm feeding an xts binary (win/loss) column to the split function, so far so good, it returns a list of xts's correctly split by year.

Then I ran lapply on this list to calculate a cumulative mean of win/loss across each year, the numeric result is okay, but it is converting the xts objects to numeric vectors, so I lose my Date index.

What might be the source of this issue?

thank you!

head of red_sox_xts$win.

            win
2010-04-04   1
2010-04-06   0
2010-04-07   0
2010-04-09   0
2010-04-10   1
2010-04-11   1

1 - feeding it to this function to split by year.

red_sox_seasons <- split(red_sox_xts$win, f = 'years')

output:

[[1]]
            win
2010-04-04   1
2010-04-06   0
     .       .
     .       .
     .       .
[[2]]
            win
2011-04-01   0
2011-04-02   0
     .       .
     .       .
     .       .

2 - Next I feed this output to the lapply function.

red_sox_ytd <- lapply(red_sox_seasons, cummean)

output: (This is where the strange behavior begins)

1.   A.1
     B.0.5
      .
      .
      .
2.   A.0
     B.0.5
      .
      .
      .

class(red_sox_ytd) is a list class(red_sox_ytd[[1]]) is numeric while it should be xts

This makes me unable to perform the next step correctly:

do.call(rbind, red_sox_ytd)
Omar Omeiri
  • 1,506
  • 1
  • 17
  • 33

1 Answers1

1

Assuming x shown in the Note at the end we can calculate the cummean by year using ave:

transform(x, cummean = ave(win, format(time(x), "%Y"), FUN = cummean))
##            win   cummean
## 2010-04-04   1 1.0000000
## 2010-04-06   0 0.5000000
## 2010-04-07   0 0.3333333
## 2010-04-09   0 0.2500000
## 2010-04-10   1 0.4000000
## 2010-04-11   1 0.5000000

Another approach (but longer) is:

do.call("rbind", lapply(split(x, "years"), transform, cummean = cummean(win)))

Note

Lines <- "date win
2010-04-04   1
2010-04-06   0
2010-04-07   0
2010-04-09   0
2010-04-10   1
2010-04-11   1"
library(xts)
x <- as.xts(read.zoo(text = Lines, header = TRUE, drop = FALSE))
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • The first one is a pretty elegant solution, I'm breaking it down to see how it works. I'll accept it as an answer in just a minute. But still wondering what is the problem with the approach I took. I used this method a million times and never had a problem with it until now. – Omar Omeiri Sep 19 '19 at 16:26
  • It is unclear what you are asking since you didn't provide something reproducible. If we use `x` from the Note in the answer then `lapply(split(x, "years"), cummean)` produces a list of plain vectors. – G. Grothendieck Sep 19 '19 at 17:15