1

My dataframe looks like this:

Date = c(rep(as.Date(seq(15000,15012)),2))
Group = c(rep("a",13),rep("b",13))
y = c(seq(1,26,1))
x1 = c(seq(0.01,0.26,0.01))
x2 = c(seq(0.02,0.26*2,0.02))
df = data.frame(Group,Date,y,x1,x2)

head(df,3)
Group Date y x1 x2
a 2011-01-26 1 0.01 0.02
a 2011-01-27 2 0.02 0.04
a 2011-01-28 3 0.03 0.06

And I would like to do multiple regression by group (y as the dependent variable and x1, x2 as the independent variables) in a rolling window i.e. 3.

I have tried to achieve this using packages tidyverse and zoo with following codes but failed.

  ## define multi-var-linear regression function and get the residual
  rsd <- function(df){
    lm(formula = y~x1+x2, data = as.data.frame(df), na.action = na.omit) %>%
      resid() %>%
      return()
  }
  ## apply it by group with rolling window
  x <- df %>% group_by(Group) %>%
    rollapplyr(. , width = 3, FUN = rsd)

The output of this code is not what I acutually want.

Does anyone know how to do multiple regression by group in a rolling window? Thanks in advance, Giselle

Thank Grothendieck and Marcus for your codes! It really helped me a lot:) I now appened them here:

# Grothendieck method
rsd <- function(df){
  lm(formula = y~x1+x2, data = as.data.frame(df), na.action = na.omit) %>%
    resid() %>%
    return()
}

width <- 5
df_m2 <-
  df %>% 
  group_by(Group) %>%
  group_modify(~ {
    cbind(., rollapplyr(.[c("y", "x1", "x2")], width, rsd, fill = NA,
                        by.column = FALSE))
  }) %>%
  ungroup %>%
  select(c("Group","Date","5")) %>%
  dplyr::rename(residual_m2 = "5")
# Marcus method
output <- data.frame()
for (i in unique(df$Group)) {
  a = df%>% subset(Group==i)
  a[,"residual"] = NA
  max = nrow(a)
  if(max<5){
    next
  }
  for (j in seq(5,max,by=1)) {
    b = a %>% slice((j-4):j)
    lm_ = lm(y~x1+x2, data = b)
    a[j,]$residual = residuals(lm_)[5]
  }
  output <-
    output %>%
    rbind(a)
}
Giselle J
  • 13
  • 3
  • I could be wrong, but wouldn't you want to add your group variable in your actual regression equation? Or are you saying you want to take the rolling window average of the residuals and apply them to each group? – Hansel Palencia Apr 25 '22 at 13:36
  • Hi @HanselPalencia, thanks for your question! What I want is regression for each group in a rolling window of 3 days. – Giselle J Apr 25 '22 at 13:56

2 Answers2

1

A good old-fashioned for-loop here could be:

for (i in unique(df$Group)){
  for (j in (seq(15000,15012, 3))){
      lm_ <- lm(formula = df[df$Group== i & df$Date %in% c(j, j+1, j+2), 3] ~ df[df$Group== i & df$Date %in% c(j, j+1, j+2), 4] + df[df$Group== i & df$Date %in% c(j, j+1, j+2), 5], na.action = na.omit)
      print(paste('Group', i, 'Dates from', j, 'to', j+3, residuals(lm_)))
  }
}
user438383
  • 5,716
  • 8
  • 28
  • 43
Marcuswas
  • 98
  • 6
  • Hi @Marcuswas, thank you for your inspiring suggest! I have edited your code a bit and the output is the same as the one of the code from G.Grothendieck. I have updated it into the body context of my question. – Giselle J Apr 26 '22 at 05:42
1

Use group_modify and use rollapplyr with the by.column = FALSE argument so that rsd is applied to all columns at once rather than one at a time.

Note that if you use width 3 with two predictors and an intercept the residuals will necessarily be all zero so we changed the width to 5.

library(dplyr, exclude = c("lag", "filter"))
library(zoo)

width <- 5

df %>% 
  group_by(Group) %>%
  group_modify(~ {
      cbind(., rollapplyr(.[c("y", "x1", "x2")], width, rsd, fill = NA,
          by.column = FALSE))
  }) %>%
  ungroup
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Thank you Grothendieck! The code works quite well on both the sample dataset and the actual dataset. I slightly edited your code to be adapted into my actual data and append it into the body context of this question. – Giselle J Apr 26 '22 at 06:08
  • If you only want Group, Date and the 5th residual it would be more direct just to return those from the function. `df %>% group_by(Group) %>% group_modify(~ { data.frame(Date = .$Date, resid5 = rollapplyr(.[c("y", "x1", "x2")], width, rsd, fill = NA, by.column = FALSE)[, width]) }) %>% ungroup` – G. Grothendieck Apr 26 '22 at 11:53