I have checked existing solutions and I was not able to find examples where I could integrate a lag (of several days) to perform a rolling regression over a window of 60 days. say (-60, -11), taking the rows as index reference.
I have a dataframe of 3 columns : dates, y (return_ake), x (return_cac_40). A sample would be the following:
Date return_ake return_cac_40
2014-01-02 NA NA
2014-01-03 -0.0091245 -0.0027052
2014-01-06 0.0083230 -0.0005976
2014-01-07 0.0041348 0.0105484
2014-01-08 0.0234186 0.0070954
2014-01-09 0.0057054 0.0017447
2014-01-10 -0.0141937 0.0012804
2014-01-13 -0.0165286 -0.0087191
2014-01-14 -0.0052142 -0.0001558
I would like to add columns to my existing df with coefficients (intercept and slope parameter) and std deviation.
I saw in previous solutions, roll_regres was the best built-in fct available to implement a rolling regression and append df with the above parameters. However, it doesn't enable lags.
temp <- roll_regres(daily_return_ake ~ daily_return_cac , DF, width = 60L, na.action=na.exclude)
tail(temp$coefs)
PB. want to take out first 10 observations.
I have attempted the following:
a- create a lag column for x -> x11 and run roll_regres using the lag value (x11) as my explanatory variable with a window of 50.
DF_2 <-
DF %>%
select(date = date, daily_return_ake = daily_return_ake, daily_return_cac = daily_return_cac) %>%
mutate(lag11 = dplyr::lag(daily_return_cac, n=11, default = NA))
and apply roll_regres
temp <- roll_regres(daily_return_ake ~ lag11 , DF_2, width = 50L, na.action=na.exclude)
tail(temp$coefs)
But it returns error with na.action (unused arg), or missing values error (when removed). Earlier attempt with return_cac accepted na.action and avoided missing value error.
b- rework existing solution Linear regression with only previous values in moving window I changed the sequence def and adapted it to my own dataframe.
fun <- function(x) unlist(tidy(lm(as.data.frame(x)))[, -1])
new_DF <- do.call("rbind", by(DF, function(x)
cbind(x, rollapplyr(x[2:3], list(seq(from = -11, to =-60)), FUN = fun, fill = NA, by.column = FALSE))))
here again, I have errors: Error in unique.default(x, nmax = nmax) : unique() applies only to vectors
expected outcome
Date return_ake return_cac_40 intercept X1 sd
2014-01-02 NA NA ....
2014-01-03 -0.0091245 -0.0027052
2014-01-06 0.0083230 -0.0005976
2014-01-07 0.0041348 0.0105484
2014-01-08 0.0234186 0.0070954
....
1- could anyone confirm that roll_regres with width say 60L takes the last 60 records, including the current date? This is not entirely clear in the documentation.
2- can anyone help me with the lag ? (attempts described above)
Thxs