0

Hi I’m starting to use r and am stuck on analyzing my data. I have a dataframe that has 80 columns. Column 1 is the dependent variable and from column 2 to 80 they are the independent variables. I want to perform 78 multiple linear regressions leaving the first independent variable of the model fixed (column 2) and create a list where I can to save all regressions to later be able to compare the models using AIC scores. how can i do it?

Here is my loop

data.frame

for(i in 2:80)

{
Regressions <- lm(data.frame$column1 ~ data.frame$column2 + data.frame [,i])  
}
Pablo
  • 41
  • 4

2 Answers2

2

With the for loop we can initialize a list to store the output

nm1 <- names(df1)[2:80]
Regressions <- vector('list', length(nm1))
for(i in seq_along(Regressions)) {
   Regressions[[i]] <- lm(reformulate(c("column2", nm1[i]), "column1"), data = df1)
  }

Or use paste instead of reformulate

for(i in seq_along(Regressions)) {
   Regressions[[i]] <- lm(as.formula(paste0("column1 ~ column2 + ", 
                                nm1[i])), data = df1)
  }

Using a reproducible example

nm2 <- names(iris)[3:5]
Regressions2 <- vector('list', length(nm2))
for(i in seq_along(Regressions2)) {
    Regressions2[[i]] <- lm(reformulate(c("Sepal.Width", nm2[i]), "Sepal.Length"), data = iris)
 }



Regressions2[[1]]

#Call:
#lm(formula = reformulate(c("Sepal.Width", nm2[i]), "Sepal.Length"), 
#    data = iris)

#Coefficients:
# (Intercept)   Sepal.Width  Petal.Length  
#      2.2491        0.5955        0.4719  
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I run the code and give me the next message `Error in terms.formula(formula, data = data) : invalid model formula in ExtractVars`. My datafreme is February, column1=PPNA, column2=Acum1 `nm1 <- names(February)[2:80] Regressions <- vector('list', length(nm1)) for(i in seq_along(Regressions)) { Regressions[[i]] <- lm(reformulate(c("Acum1", nm1[i]), "PPNA"), data=February) }` I don't know what i'm doing wrong – Pablo Apr 06 '20 at 23:31
  • @Pablo Can you just print out the `reformulate` output `print(reformulate(c("Acum1", nm1[i]), "PPNA"))` within the loop – akrun Apr 06 '20 at 23:34
  • @Pablo For me `lm(reformulate(c("Sepal.Length", "Sepal.Width"), "Petal.Length"), data = iris)` works – akrun Apr 06 '20 at 23:35
  • @Pablo can you please telll me the output – akrun Apr 06 '20 at 23:39
  • The same thing `Error in terms.formula(formula, data = data) : invalid model formula in ExtractVars` – Pablo Apr 06 '20 at 23:53
  • @Pablo that is not reproducible to me. If you check my updated example with iris – akrun Apr 06 '20 at 23:56
  • @Pablo can you tell me your `R` version – akrun Apr 06 '20 at 23:58
  • R version 3.6.1 – Pablo Apr 07 '20 at 00:08
  • @Pablo that version is fine, I am on 3.6.2 – akrun Apr 07 '20 at 00:17
  • @Pablo are you getting the error even with the iris dataset in my rerprodducible example for loop – akrun Apr 07 '20 at 00:18
  • I can do multiple regressions using the lapply function, but I can't compare AIC scores, so I tried the loop to see if I could solve it. I asked another question on stackoverflow with my original code. maybe you can help me? [link](https://stackoverflow.com/questions/61070936/it-is-possible-to-compare-the-multiple-regression-models-using-aic-scores) – Pablo Apr 07 '20 at 00:30
1

Using the iris dataset as an example you can do:

lapply(seq_along(iris)[-c(1:2)], function(x) lm(data = iris[,c(1:2, x)]))

[[1]]

Call:
lm(data = iris[, c(1:2, x)])

Coefficients:
 (Intercept)   Sepal.Width  Petal.Length  
      2.2491        0.5955        0.4719  


[[2]]

Call:
lm(data = iris[, c(1:2, x)])

Coefficients:
(Intercept)  Sepal.Width  Petal.Width  
     3.4573       0.3991       0.9721  


[[3]]

Call:
lm(data = iris[, c(1:2, x)])

Coefficients:
      (Intercept)        Sepal.Width  Speciesversicolor   Speciesvirginica  
           2.2514             0.8036             1.4587             1.9468  

This works because when you pass a dataframe to lm() without a formula it applies the function DF2formula() under the hood which treats the first column as the response and all other columns as predictors.

Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56
  • how to filter these models lets say using pvalue from the summary without converting these models whose class is list to data frame how to do that? – PesKchan Apr 27 '22 at 23:00
  • 1
    @PesKchan - You should ask a new question (you can reference this question if necessary) because while not difficult it's not the focus here. – Ritchie Sacramento Apr 27 '22 at 23:05
  • okay will do it ...that would be more helpful – PesKchan Apr 27 '22 at 23:07
  • here is my question https://stackoverflow.com/questions/72020943/how-to-plot-regression-coefficients-using-ggcoefstats-from-ggstatsplot – PesKchan Apr 27 '22 at 23:18