1

I found some previous questions on this topic especially this R: Grouped rolling window linear regression with rollapply and ddply and R: Rolling / moving avg by group , however, both questions did not provide an exact solution for the problem that I am facing. I am currently trying to estimate CAPM beta over panel data using a linear regression. So I have different funds (in the example below I used 3 fund groups) for which I would like to calculate the betas separately and per row. To put this more abstract: I am trying to do a linear regression with a moving window by group to estimate the coefficient for every row based on the data in the window.

install.packages("zoo","dplyr")
library(zoo);library(dplyr)

# Create dataframe
fund <- as.numeric(c(1,1,1,1,1,1,1,1,3,3,3,3,3,3,2,2,2,2,2,2,2))
return<- as.numeric(c(1:21))
benchmark <- as.numeric(c(1,13,14,20,14,32,4,1,5,7,1,0,7,1,-2,1,6,-7,9,10,9))
riskfree<-as.numeric(c(1,5,1,2,1,6,4,7,5,-5,10,0,3,1,2,1,6,7,8,9,10))
date <- as.Date(c("2010-07-30","2010-08-31","2010-09-30","2010-10-31","2010-11-30","2010-12-31","2011-01-30",
              "2011-02-28","2010-07-31","2010-09-30","2010-10-31","2010-11-30","2010-12-31","2011-01-30",
              "2010-07-30","2010-08-31","2010-09-30","2010-10-31","2010-11-30","2010-12-31","2011-01-30"))
funddata<-data.frame(date,fund,return,benchmark,riskfree)

# Creating variables of interest
funddata["ret_riskfree"]<-as.numeric(funddata$return-funddata$riskfree)
funddata["benchmark_riskfree"]<-as.numeric(funddata$benchmark-funddata$riskfree)

I want to do a rolling regression over two columns df[6:7] for every group indicated by the column "fund". The calculation should be done separately so the first two rows in the beta column for every fund group will always show "NA". In the end I want to have a full dataframe with all fund groups and all beta values combined. I managed to come up with a new code that works but is pretty messy and it requires to order the data by fund & date before executing. I would welcome any suggestions on how to make it better.

 funddata <- funddata[order(funddata$fund, funddata$date),]   
 beta_func <- function(x, benchmark_riskfree, ret_riskfree) {
  a <- coef(lm(as.formula(paste(ret_riskfree, "~", benchmark_riskfree,-1)), 
           data = x))
  return(a)
}
beta_list<-list()
for (i in c(1:3)){beta_list[[paste(i, sep="_")]]<-    (rollapplyr(funddata[(funddata$fund==i),6:7], width = 3,
           FUN = function(x) beta_func(as.data.frame(x), "benchmark_riskfree" ,   "ret_riskfree"),
           by.column = FALSE,fill=NA))}
beta_list<-unlist(beta_list, recursive=FALSE)
funddata$beta<-beta_list
ruki ami
  • 73
  • 6

1 Answers1

0

As I mentioned in the comment above, this solution might be a bit off since I'm not able to reproduce your desired output 100%. Still, the functionality of what you're trying to accomplish is there. Have a look at it and let me know if this is something you could use or I could develop further.

EDIT: The code below does not reproduce the desired output as specified above, but turned out to be what the OP was looking for after all.

Here goes:

# Datasource
fund <- as.numeric(c(1,1,1,1,1,1,1,1,3,3,3,3,3,3,2,2,2,2,2,2,2))
return<- as.numeric(c(1:21))
benchmark <- as.numeric(c(1,13,14,20,14,32,4,1,5,7,1,0,7,1,-2,1,6,-7,9,10,9))
riskfree<-as.numeric(c(1,5,1,2,1,6,4,7,5,-5,10,0,3,1,2,1,6,7,8,9,10))
date <- as.Date(c("2010-07-30","2010-08-31","2010-09-30","2010-10-31","2010-11-30","2010-12-31","2011-01-30",
                  "2011-02-28","2010-07-31","2010-09-30","2010-10-31","2010-11-30","2010-12-31","2011-01-30",
                  "2010-07-30","2010-08-31","2010-09-30","2010-10-31","2010-11-30","2010-12-31","2011-01-30"))
funddata<-data.frame(date,fund,return,benchmark,riskfree)

# Creating variables of interest
funddata["ret_riskfree"]<-as.numeric(funddata$return-funddata$riskfree)
funddata["benchmark_riskfree"]<-as.numeric(funddata$benchmark-funddata$riskfree)

# Target check #################################################################
# Subset last three rows in original dataframe
df_check <- funddata[funddata$fund == 1,]
df_check <- tail(df_check,3)

# Run regression check
mod_check <- lm(df_check$ret_riskfree~df_check$benchmark_riskfree)
coef(mod_check)

# My suggestion ################################################################
# The following function takes three arguments:
# 1. a dataframe, myDf
# 2. a column that you'd like to myDf on
# 3. a window length for a sliding window, myWin

fun_rollreg <- function(myDf, subCol, varY, varX, myWin){
  df_main <- myDf
  
  # Make an empty data frame to store results in
  df_data <- data.frame()
  
  # Identify unique funds
  unFunds <- unique(unlist(df_main[subCol]))
  
  # Loop through your subset
  for (fundx in unFunds){
    
    # Subset
    df <- df_main
    df <- df[df$fund == fundx,]
    
    # Keep a copy of the original until later
    df_new <- df
    
    # Specify a container for your beta estimates
    betas <- c()
    
    # Specify window length
    wlength <- myWin
 
    # Retrieve some data dimensions to loop on
    rows = dim(df)[1]
    periods <- rows - wlength
    
    # Loop through each subset of the data
    # and run regression
    for (i in rows:(rows - periods)){
      
        # Split dataframe in subsets
        # according to the window length
        df1 <- df[(i-(wlength-1)):i,]
    
        # Run regression
        beta <- coef(lm(df1[[varY]]~df1[[varX]]))[2]
        
        # Keep regression ressults
        betas[[i]] <- beta
    }
    # Add regression data to dataframe
    df_new <- data.frame(df, betas)
    
    # Keep the new dataset for later concatenation
    df_data <- rbind(df_data, df_new)
  }   
  return(df_data)
}

# Run the function:
df_roll <- fun_rollreg(myDf = funddata, subCol = 'fund',
                      varY <- 'ret_riskfree', varX <- 'benchmark_riskfree',
                      myWin = 3)
# Show the results
print(head(df_roll,8))

For the first 8 rows in the new dataframe (fund = 1), this is the result:

  date         fund return benchmark riskfree ret_riskfree benchmark_riskfree       betas
1 2010-07-30    1      1         1        1            0                  0          NA
2 2010-08-31    1      2        13        5           -3                  8          NA
3 2010-09-30    1      3        14        1            2                 13  0.10465116
4 2010-10-31    1      4        20        2            2                 18  0.50000000
5 2010-11-30    1      5        14        1            4                 13 -0.20000000
6 2010-12-31    1      6        32        6            0                 26 -0.30232558
7 2011-01-30    1      7         4        4            3                  0 -0.11538462
8 2011-02-28    1      8         1        7            1                 -6 -0.05645161
vestland
  • 55,229
  • 37
  • 187
  • 305
  • varX and varY should definitely be arguments and not hard-coded in the function. I'll sort that out tomorrow. – vestland Aug 23 '17 at 16:40
  • 1
    thank you! My mistake was to write coef(lm(as.formula(paste(ret_riskfree, "~", benchmark_riskfree,-1)), instead of coef(lm(as.formula(paste(ret_riskfree, "~", benchmark_riskfree))[2] sorry that this caused confusion. – ruki ami Aug 23 '17 at 18:35
  • I made function arguments out of the hard-coded variables. Eventually, someone may come along and suggest a more elegant solution, but this does the job. Your question was interesting, well explained with a reproducible dataset, so I guess you know your way around SO already, but I'd appreciate if you'd consider having a look at [this link](https://stackoverflow.com/help/someone-answers) if the suggestion was useful. – vestland Aug 24 '17 at 06:31