2

I am trying to do the following rolling regression: stock_return = α + β market_return.

My dataset "data" looks as follows:

Company Date        stock_return    market_return       Alpha   Beta
AAPL    01.01.2014  3%              4%
…
AAPL    31.12.2019  5%              1%
MSFT    01.01.2014  2%              4%
…
MSFT    31.12.2019  6%              1%

Moreover, I have a dataframe with events at which the regression should be performed:

Company Date        
AAPL    05.02.2015
…   
MSFT    04.08.2018

I am using the runner package for the rolling regression:

running_regression <- function(z) {
  coef(lm(stock_return ~ market_return , data = as.data.frame(z)))
}

output <- runner(seq_along(data$market_return), 
                         k = 180, 
                         lag = 5,
                         at = ?
                         f = running_regression)

However, I struggle with my function and what to put into "at" to just do a regression for the events in my second dataframe.

Please see below a minimal example:

    Date <- seq(from = as.Date("2014-01-01"), to = as.Date("2019-12-31"), by = 'day')
    Date <- format(Date, format="%d.%m.%Y")

    Company.name <- c(replicate(2191, ""))
    AAPL <- data.frame(Company.name = "AAPL",Date)
    market_return <- c(rnorm(2191))
    AAPL <- cbind(AAPL, market_return)
    MSFT <- c(replicate(2191, "MSFT"))
    MSFT <- data.frame(Company.name = "MSFT",Date)
    MSFT <- cbind(MSFT, market_return)
    df <- rbind(AAPL, MSFT)
    stock_return <- c(rnorm(4382))
    df <- cbind(df,stock_return)
    df[,"Alpha"] <- NA
    df[,"Beta"] <- NA

    Company.name2 <- c(replicate(450, "AAPL"),replicate(450, "MSFT"))
    Event_date <- sample(seq(as.Date('2015/01/01'), as.Date('2019/12/31'), by="day"), 900)
    Event_date <- format(Event_date, format="%d.%m.%Y")
    df2 <- data.frame(Company.name2, Event_date)

    #____

    running_regression <- function(z) {
      coef(lmList(stock_return ~ market_return | Company.name, data=z))
    }


runner(df$market_return, 
       k = 180, 
       lag = 5,
       f = running_regression)
Ben
  • 183
  • 8
  • Could you do the rolling regression for all dates in the first data frame, then subset for the dates in the second? – Mark Neal Apr 04 '20 at 10:09
  • Yes this would work as well. However, I have to incorporate something like a "group_by" company component so that the 180 days sample is not taken from the previous company. Do you have any suggestions? – Ben Apr 04 '20 at 10:23
  • 1
    Group_by sounds right, though I’m not sure whether the runner command will play Nicely with a tidyverse pipe in front. Maybe think about making a small dataset so we have A reproducible example to work with? – Mark Neal Apr 04 '20 at 10:31
  • Hi Mark. I've now added a minimal example. Hope that helps. Thanks for your help on my problem! – Ben Apr 04 '20 at 10:54

2 Answers2

1

Since runner version 0.3.5 you can specify x = df and do running regression on windows from data.frame. Because you mutate df2 you have to subset relevant part of df by x = df[df$Company.name == Company.name2,] - you have to do the same with idx.

running_regression_intercept <- function(x) {
  coef(lm(stock_return ~ market_return, data = x))[1]
}

running_regression_slope <- function(x) {
  coef(lm(stock_return ~ market_return, data = x))[2]
}
library(dplyr)
library(runner)
df2 %>%
  group_by(Company.name2) %>%
  mutate(
    intercept = runner(
      x = df[df$Company.name ==  Company.name2[1], ],
      k = "180 days",
      lag = "5 days",
      idx = df$Date[df$Company.name == Company.name2[1]],
      at = Event_date,
      f = running_regression_intercept,
    ),
    slope = runner(
      x = df[df$Company.name == Company.name2[1], ],
      k = "180 days",
      lag = "5 days",
      idx = df$Date[df$Company.name == Company.name2[1]],
      at = Event_date,
      f = running_regression_slope
    )
  )

# Company.name2 Event_date    alpha     beta
#     <fct>         <date>        <dbl>    <dbl>
#   1 AAPL          2017-01-12  0.0114   0.00488
#   2 AAPL          2017-07-31 -0.0654   0.00574
#   3 AAPL          2019-02-27 -0.0861   0.0310 
#   4 AAPL          2018-09-06  0.0405  -0.0630 
#   5 AAPL          2015-09-03 -0.121   -0.0246 
#   6 AAPL          2018-11-20 -0.0283  -0.0254 
#   7 AAPL          2015-07-03 -0.116   -0.0186 
#   8 AAPL          2015-02-03  0.102    0.0409 
#   9 AAPL          2017-03-16 -0.0157   0.0124 
#   10 AAPL          2019-06-08 -0.00302  0.0532 

I needed to modify your data a bit because format changed Event.Date from Date to character.

Date <- seq(from = as.Date("2014-01-01"), 
            to = as.Date("2019-12-31"), 
            by = 'day')
market_return <- c(rnorm(2191))

AAPL <- data.frame(
  Company.name = "AAPL", 
  Date =  Date,
  market_return = market_return
)

MSFT <- data.frame(
  Company.name = "MSFT", 
  Date = Date,
  market_return = market_return
)

df <- rbind(AAPL, MSFT)
df$stock_return <- c(rnorm(4382))
df <- df[order(df$Date),]

df2 <- data.frame(
  Company.name2 = c(replicate(450, "AAPL"), replicate(450, "MSFT")), 
  Event_date = sample(
    seq(as.Date('2015/01/01'), 
        as.Date('2019/12/31'), 
        by="day"),
    size =  900)
  )
GoGonzo
  • 2,637
  • 1
  • 18
  • 25
  • Thanks very much for your answer. I checked the data and found that the coefficients of two different firms at the same date are the same. Is that maybe because we ordered df by date and it mixed the dates from the different firms? – Ben Apr 06 '20 at 08:50
  • Aaaa yes, you are right! I will try to find better solution in the evening. – GoGonzo Apr 06 '20 at 09:07
  • Thank you. you are a lifesaver! – Ben Apr 06 '20 at 09:24
  • I've updated code above. Supporting `data.frame` should be available soon - if interested in this feature please follow this [issue](https://github.com/gogonzo/runner/issues/49) – GoGonzo Apr 06 '20 at 20:06
  • @Ben I've just made a new CRAN release and user can apply runner also on `data.frame`. So the example is bit easier - already changed the answer – GoGonzo Apr 11 '20 at 13:57
  • Thank you so much! – Ben Apr 22 '20 at 12:49
0

Hope I got you correct,

If you want to only run the regression on the specific dates in df2, a quick solution is merge them using the relevant names:

head(merge(df2,df,by.x=c("Company.name2","Event_date"),by.y=c("Company.name","Date")))
  Company.name2 Event_date market_return stock_return Alpha Beta
1          AAPL 01.01.2016    -0.4814525   -0.9864268    NA   NA
2          AAPL 01.01.2019    -1.3093155    0.8357707    NA   NA
3          AAPL 01.02.2017     1.5059626   -1.3181592    NA   NA
4          AAPL 01.03.2015    -1.3226590    1.4786993    NA   NA
5          AAPL 01.03.2016     2.1394338   -0.8900286    NA   NA
6          AAPL 01.03.2019     0.6035526    0.5453212    NA   NA

You can see the dates are messed up, so it makes sense to sort your df2 by company and date first:

df2 = df2[order(df2$Company.name2,
as.Date(as.character(df2$Event_date),format="%d.%m.%Y")),]

df_merged = merge(df2,df,
by.x=c("Company.name2","Event_date"),
by.y=c("Company.name","Date"),sort=FALSE)

To do the rolling regression you can use lmList here but it will get a bit more complicated. Below I use roll_lm from roll to calculate the coefficient.. Don't quite understand your lag, maybe you can elaborate on that, but there's no way you can do a regression on runner:

library(roll)
result = by(df_merged,df_merged$Company.name2,function(z){
roll_lm(z$market_return,z$stock_return,180)})

tail(result$AAPL$coefficients)
       (Intercept)         x1
[445,] -0.07817682 0.10662762
[446,] -0.06440454 0.09257577
[447,] -0.07007445 0.09461642
[448,] -0.05917523 0.09582312
[449,] -0.05292590 0.10025369
[450,] -0.04930798 0.09911921
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • Thanks for the reply. Merging the two dataframes will not work, because I also need the other stock and market return data for the 180 days. The lag looks as follows: For each event date t=0 make a regression with data from t=-186 to t=-6. – Ben Apr 05 '20 at 11:45
  • ok if i get you correct, you don't need a rolling regression but rather a regression at the specific points but over the whole data? basically a subset of the rolling regression – StupidWolf Apr 05 '20 at 11:48