0

The following data table contains returns (RET) of 5 portfolios (portfolio numbers are factors NOT integers) for two dates.

set.seed(123)
DT <- data.table(date = rep(as.Date(c("2005-05-02", "2005-05-03")), each = 5), portfolio = factor(rep(1:5, 2), levels = c(1:5, "diff", "avg")), RET = rnorm(n = 10))

           date portfolio         RET
  1: 2005-05-02         1 -0.56047565
  2: 2005-05-02         2 -0.23017749
  3: 2005-05-02         3  1.55870831
  4: 2005-05-02         4  0.07050839
  5: 2005-05-02         5  0.12928774
  6: 2005-05-03         1  1.71506499
  7: 2005-05-03         2  0.46091621
  8: 2005-05-03         3 -1.26506123
  9: 2005-05-03         4 -0.68685285
 10: 2005-05-03         5 -0.44566197

For each date, I want to add to the data table the return of the difference portfolio, i.e. the difference between the return of the 5th portfolio and the return of the 1st portfolio, and the return of the average portfolio, i.e. the average return of the five portfolios. In particular, I want to create the following data.table

          date portfolio         RET
 1: 2005-05-02         1 -0.56047565
 2: 2005-05-02         2 -0.23017749
 3: 2005-05-02         3  1.55870831
 4: 2005-05-02         4  0.07050839
 5: 2005-05-02         5  0.12928774
 6: 2005-05-02       avg  0.19357026
 7: 2005-05-02      diff  0.68976338
 8: 2005-05-03         1  1.71506499
 9: 2005-05-03         2  0.46091621
10: 2005-05-03         3 -1.26506123
11: 2005-05-03         4 -0.68685285
12: 2005-05-03         5 -0.44566197
13: 2005-05-03       avg -0.04431897
14: 2005-05-03      diff -2.16072696

One way to do this (based on this post) is

DT = DT[, .SD[1:(.N+1)], date][, .(portfolio = replace(portfolio, is.na(portfolio), "avg"), RET = replace(RET, is.na(portfolio), mean(RET[!is.na(RET)]) ) ), date]
DT = DT[, .SD[1:(.N+1)], date][, .(portfolio = replace(portfolio, is.na(portfolio), "diff"), RET = replace(RET, is.na(portfolio), RET[portfolio == "5"] - RET[portfolio == "1"]) ), date]

Another way would be to create new data tables for the difference and average portfolio and then rbindlist them all.

DT = rbindlist(
  l = list(DT, 
           DT[, .(portfolio = "diff", RET = RET[portfolio == "5"] -   RET[portfolio == "1"]), by = date],
           DT[, .(portfolio = "avg", RET = mean(RET)), by = date]
))
DT[order(date, portfolio)]

Is there a better way?

Community
  • 1
  • 1
conighion
  • 180
  • 7
  • 2
    I think the `rbindlist` approach would be better. – akrun Jul 31 '16 at 16:55
  • 1
    You can also `rbind` within the DT[...] call like `DT[, rbind(.SD, other_stuff), by=date]`, but I don't think it's a good idea to store summary data in the same table as the full data. – Frank Jul 31 '16 at 18:43

0 Answers0