0

I have checked existing solutions and I was not able to find examples where I could integrate a lag (of several days) to perform a rolling regression over a window of 60 days. say (-60, -11), taking the rows as index reference.

I have a dataframe of 3 columns : dates, y (return_ake), x (return_cac_40). A sample would be the following:

    Date             return_ake    return_cac_40
    2014-01-02         NA             NA
    2014-01-03       -0.0091245     -0.0027052
    2014-01-06        0.0083230     -0.0005976
    2014-01-07        0.0041348      0.0105484
    2014-01-08        0.0234186      0.0070954
    2014-01-09        0.0057054      0.0017447
    2014-01-10       -0.0141937      0.0012804
    2014-01-13       -0.0165286     -0.0087191
    2014-01-14       -0.0052142     -0.0001558

I would like to add columns to my existing df with coefficients (intercept and slope parameter) and std deviation.

I saw in previous solutions, roll_regres was the best built-in fct available to implement a rolling regression and append df with the above parameters. However, it doesn't enable lags.

    temp <- roll_regres(daily_return_ake  ~ daily_return_cac , DF, width = 60L, na.action=na.exclude)
    tail(temp$coefs)

PB. want to take out first 10 observations.

I have attempted the following:

a- create a lag column for x -> x11 and run roll_regres using the lag value (x11) as my explanatory variable with a window of 50.

    DF_2 <-
    DF %>%
    select(date = date, daily_return_ake = daily_return_ake, daily_return_cac = daily_return_cac) %>%
    mutate(lag11 = dplyr::lag(daily_return_cac, n=11, default = NA))

and apply roll_regres

    temp <- roll_regres(daily_return_ake  ~ lag11 , DF_2, width = 50L, na.action=na.exclude)
    tail(temp$coefs)

But it returns error with na.action (unused arg), or missing values error (when removed). Earlier attempt with return_cac accepted na.action and avoided missing value error.

b- rework existing solution Linear regression with only previous values in moving window I changed the sequence def and adapted it to my own dataframe.

    fun <- function(x) unlist(tidy(lm(as.data.frame(x)))[, -1]) 
    new_DF <- do.call("rbind", by(DF, function(x) 
    cbind(x, rollapplyr(x[2:3], list(seq(from = -11, to =-60)), FUN = fun, fill = NA, by.column = FALSE))))

here again, I have errors: Error in unique.default(x, nmax = nmax) : unique() applies only to vectors

expected outcome

    Date             return_ake    return_cac_40    intercept    X1    sd  
    2014-01-02         NA             NA             ....
    2014-01-03       -0.0091245     -0.0027052
    2014-01-06        0.0083230     -0.0005976
    2014-01-07        0.0041348      0.0105484
    2014-01-08        0.0234186      0.0070954
       ....

1- could anyone confirm that roll_regres with width say 60L takes the last 60 records, including the current date? This is not entirely clear in the documentation.

2- can anyone help me with the lag ? (attempts described above)

Thxs

Jules
  • 15
  • 4
  • Is there a reason why you do not wish to calculate this in a `for` loop? – nya Mar 02 '20 at 10:03
  • 1
    You're asking an awful lot for a single question. Per [Stack Overflow's guidance](https://stackoverflow.com/help/on-topic), try to narrow this to a single question and ask it with a [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). – ulfelder Mar 02 '20 at 10:22

0 Answers0