0

Reproduce-able data set-

set.seed(55)
data <- rnorm(8)

dates <- as.POSIXct("2019-03-18 10:30:00", tz = "CET") + 0:7*60
dataset <- xts(x = data, order.by = dates)

colnames(dataset) <- "R"
dataset$Timestep <- 1:8
dataset$Label <- 1
dataset$Label[4:8,] <- 2 

I am trying to fit linear regression model separately for each label by taking "R" as dependent variable and "timestamp" as predictor and return all the slopes ( in this case- 2).

Initially my thought was to use split and lapply function but could not manage to execute it as I don't know how to access list of list with lapply.

As the dataset is really large, I want to avoid for loop. Can you guys help? Really appreciate it.

Rel_Ai
  • 581
  • 2
  • 11
  • 1
    like this? `sp <- split(dataset, dataset$Label); lapply(sp, function(x) coef(lm(R ~ Timestep, x))[2])` I'm not sure why you would need to access a list of a list here, but in the example, you can do `x[[1]]` to get the first part of each element of `x` in each loop – rawr Dec 28 '19 at 16:22
  • Yes, exactly like this. I guess I just did not understand how to write the function inside lapply. Thank you – Rel_Ai Dec 28 '19 at 16:29

1 Answers1

2

1) formula Use the formula shown to nest within Label:

co <- coef(lm(R ~ factor(Label) / (Timestep + 1) + 0, dataset))
co[grep("Timestep", names(co))]
## factor(Label)1:Timestep factor(Label)2:Timestep 
##              0.01572195              0.15327212 

2) split/lapply Alternately use split/lapply as shown:

slope <- function(x) coef(lm(R ~ Timestep, x))[2]
sapply(split(dataset, dataset$Label), slope)
## 1.Timestep 2.Timestep 
## 0.01572195 0.15327212 

2a) Alternately we can use the same last line of code but replace the slope function with a calculation that directly computes the slope without lm:

slope <- function(x) with(x, cov(R, Timestep)  / var(Timestep))
sapply(split(dataset, dataset$Label), slope)  # same as sapply line in (2)
##          1          2 
## 0.01572195 0.15327212 

3) nlme This package comes with R so does not have to be installed.

library(nlme)
coef(lmList(R ~ Timestep | Label, dataset))[, "Timestep"]
## [1] 0.01572195 0.15327212
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341