I have a huge dataset and would like to perform a rolling linear regression over a window of 60. However, I want that only the 60 previous values are considered for the linear regression.
My Dataframe DF consists of following Columns:
Date Company Y X1 X2
01.01.2015 Mill 0.13 -1 -3
01.02.2015 Mill 0.16 1 5
01.03.2015 Mill 0.83 3 4
01.04.2015 Mill -0.83 23 4
01.01.1988 Hall 0.23 1 3
01.02.1988 Hall 0.24 23 2
01.03.1988 Hall 0.78 19 -9
01.04.1988 Hall 0.73 4 12
01.05.1988 Hall 0.72 5 12
01.11.2008 Jopo 0.12 0.9 32
01.12.2008 Jopo 0.13 10 32
01.01.2009 Jopo 0.32 0.2 10
01.02.2009 Jopo 0.32 2 -1
I have several thousand companies and data for several months for each company. The regression has to be done for every month of a company, with the rolling window of 60 previous months of this specific company.
In the given example, assuming only a rolling window of 3, I want for company Mill a regression on 01.04.2015 with the data from 01.01-01.03-2015. For company Hall I want regressions on 01.04 and 01.05.1988, and for Jopo I want a regression on 01.02.2009.
Ideally, the results will be pasted together with Company and Date in a new data frame, as I have to keep working with this data and have to analyse it more.
Following code should do the trick for the rolling regression, however it does not use the previous 60 dates, but 59 and includes the current date too:
library(zoo)
rolled <- function(df) {
rollapply(df, width = 60,
FUN = function(z) coef(lm(Y ~ X1+X2, data = as.data.frame(z))),
by.column = FALSE, align = "right"
)
}
Following code does the regression dependent on the company name, as I want to make regressions for each individual company, independend from the other companies.
Test <- do.call("rbind", by(DF[c("Y", "X1", "X2")], DF[c( "Name")], rolled))
How do I incorporate, that only the 60 previous values are used for the regression? And maybe someone knows how to show also "Company" and "Date" in the results? Thanks for your help!