3

This question is basically an extention of something I asked before: How to only print (adjusted) R-squared of regression model?

I want to make a linear regression model to predict concentrations with 150 potential predictors. I want to perform a manual stepwise forward procedure. The dataset looks more or less like this:

df = data.frame(
Site = c("A", "B", "C", "D"),
Concentration = c(2983, 9848, 2894, 8384),
Var1 = c(12, 23, 34, 45),
Var2 = c(23, 34, 45, 56))

I use the following code to make a univariate model for every possible predictor and check the adjusted R-squared.

for (j in names(df)){
model <- lm(Concentration ~ df[[j]], data = df)
print(j)
print(summary(model)$adj.r.squared)

[1] "site"
  r.squared adj.r.squared
1 0.02132635    -0.9573473

It is however, a lot of work to check the adjusted R-squared for 150 variables.

Is it possible to either make a dataframe with all adjusted R-squared values and each corresponding variable name?

Or to rank the adjusted R-squared values, so the highest value is first (and corresponding variable name printed with it)?

I am very curious to hear if something like this is possible. It would help me enormously.

Thanks in advance!

Qeshet
  • 79
  • 2
  • 7

1 Answers1

3

You can save you result into aa matrix and then print this matrix. First you create a new matrix

adj.r.mat   <- matrix(, nrow = length(names(df)), 
                        ncol = 2)
               colnames(adj.r.mat) <- c("Name Var", "Adj.R")

then you save the value that interest you in this matrix

for (j in 1:length(names(df))){
model <- lm(Concentration ~ df[[j]], data = df)
adj.r.mat[j,1] <- names(df)[j]
adj.r.mat[j,2] <- summary(model)$adj.r.squared
}

Finaly you print it

print(adj.r.mat)

If you don't want the 2 first variables, you can start the loop at 3.

for (j in 3:length(names(df))){
model <- lm(Concentration ~ df[[j]], data = df)
adj.r.mat[j,1] <- names(df)[j]
adj.r.mat[j,2] <- summary(model)$adj.r.squared
}

And then exclude the 2 first row when you print your matrix

print(adj.r.mat[-c(1,2),])
TeYaP
  • 303
  • 6
  • 21
  • I just edited the code. You need to correct your "df". One entry is missing for "Site". I changed it for c("A", "B", "C", "D") . – TeYaP Nov 02 '18 at 14:52
  • @Qeshet you can accept the answer if you find this solution satisfying – TeYaP Nov 04 '18 at 09:41