1

I have the following regression model;

models <- lapply(1:25, function(x) lm(Y_df[,x] ~ X1))

Which runs 25 regressions on 25 columns in the Y_df dataframe.

One of the outputs can be shown as;

models[15] # Gives me the coefficients for model 15

Call:
lm(formula = Y_df[, x] ~ X1)

Coefficients:
(Intercept)         X1 
  0.1296812    1.0585835  

Which I can store in a separate df. The problem I am running into is regarding Std. Error, R2, residules etc.

I would like to store these also into a separate dataframe.

I can run individual regressions and extract the summaries as a normal R regression output would look like.

ls_1 <- summary(models[[1]])
ls_1
ls_1$sigma

However I am hoping to take the values directly from the line of code which runs the 25 regressions.

This code works

> (models[[15]]$coefficients)
  (Intercept)          X1 
-0.3643446787  1.0789369642

However; this code does not.

> (models[[15]]$sigma)
NULL

I have tried a variety of different combinations to try and extract these results with no luck.

The following did exactly what I wanted perfectly. I had hoped there was a way to replace the word coef with Std Error or R2 etc. but this does not work.

models <- lapply(1:25, function(x) lm(Y_df[,x] ~ X1))
# extract just coefficients
coefficients <- sapply(Y_df, coef)

Ideally I would like to store the Std Error from the above model

user113156
  • 6,761
  • 5
  • 35
  • 81
  • 2
    Possible duplicate of [pull out p-values and r-squared from a linear regression](https://stackoverflow.com/questions/5587676/pull-out-p-values-and-r-squared-from-a-linear-regression) – dww Dec 01 '17 at 19:12
  • Very simple. You need to calculate the summary in order to get the additional statistics. – Roman Luštrik Dec 01 '17 at 19:44
  • Not a duplicate. I knew how to extract singly the coefficients, Rsquared etc but for this particular task I wanted to extract the values for 25 regressions and store them in a df – user113156 Dec 01 '17 at 19:55
  • Use `summary` in your sapply` (don't forget `simplify = FALSE`) and you should have summary statistics available to fetch using the `sapply`-way. – Roman Luštrik Dec 02 '17 at 10:05
  • Hi Ben I can live with that I just observe that most new users who would be searching check answers long before they read **all** the comments. There are a plethora of related but unlinked answers about how to get `lm` results with variations on the # of IV or DV and really just can be via loop, apply family or purrr. They may not be direct duplicates because of some nuance but they are all related – Chuck P Sep 18 '20 at 19:38

1 Answers1

0

If a model is named mod, you can get to all of the residuals in the same way as the coefficients:

mod$residuals

There are also functions that extract the coefficients and residuals:

coef(mod)
resid(mod)

The other outputs, you can extract via summary:

summary(mod)$coef[,"Std. Error"]  # standard errors
summary(mod)$r.squared            # r squared
summary(mod)$adj.r.squared        # adjusted r squared

So you can either create a list containing each of these results for each model:

outputList <- lapply(models, function(x){
  coefs <- coef(mod)
  stdErr <- summary(mod)$coef[,"Std. Error"]
  rsq <- summary(mod)$r.squared
  rsq_adj <- summary(mod)$adj.r.squared
  rsd <- resid(mod)
  list(coefs = coefs, 
       stdErr = stdErr, 
       rsq = rsq, 
       rsq_adj = rsq_adj, 
       rsd = rsd)
})

You can then get to the rsq for mod1 via outputList$mod1$rsq, for example.

Or you can create separate dataframes for each:

library(tidyverse)

# coefficients
coefs <- lapply(models, coef) %>%
  do.call(rbind, .) %>%
  as.data.frame() %>% # convert from matrix to dataframe
  rownames_to_column("model") # add original model name as a column in the dataframe

# standard errors
stdErr <- lapply(models, function(x){
  summary(mod)$coef[,"Std. Error"]
}) %>%
  do.call(rbind, .) %>%
  as.data.frame() %>% 
  rownames_to_column("model") 

# r squareds
rsq <- sapply(models, function(x){
  summary(mod)$r.squared
}) %>%
  as.data.frame() %>% 
  rownames_to_column("model")

# adjusted r squareds
rsq_adj <- sapply(models, function(x){
  summary(mod)$adj.r.squared
})%>%
  as.data.frame() %>% 
  rownames_to_column("model")

# residuals
rsd <- lapply(models, resid) %>%
  do.call(rbind, .) %>%
  as.data.frame() %>% 
  rownames_to_column("model") 

Worth noting that, if you're in RStudio and you assign the summary to something (ie temp <- summary(mod)), you can type the name of the object, then "$" and a dropdown of all the other objects that can be extracted from the summary appears.

James
  • 67
  • 6