0

I am trying to exclude correlated variables from GLModel. Firstly, I calculate correlation matrix. Afterwards, I would like to implement it into combn function in some way to exclude the variables (column headers) that are correlated. At this point I fail - I am not able to incorporate it in combn function so that it worked and correlated variables were excluded.

Here is the link for data I use: https://drive.google.com/open?id=0B5IgiR_svnKcZkxHeTJXTm9jUjQ

Here is the code I am trying to make it work:

## rm(list = ls())  ## Edited out to prevent accidents

mod_data <- read.csv("mod_data.csv", header = T)

mod_headers <- names(mod_data[3:ncol(mod_data)-1])

CM = which(abs(cor(mod_data[,1:ncol(mod_data)-1])-diag(1,ncol(mod_data)-1)) > 0.5, arr.ind = T)

f <- function(){

  null_model <- glm(newcol ~ 1, data=mod_data, family = binomial(link = "logit"), control = list(maxit = 50))
  best_model <- null_model
  best_aic <- AIC(null_model)

  for(i in 1:length(mod_headers)){
    tab <- combn(mod_headers,i)
    for(j in 1:ncol(tab)){
      tab_new <- c(tab[,j])
      mod_tab_new <- c(tab_new, "newcol")
      model <- glm(newcol ~., data=mod_data[c(mod_tab_new)], family = binomial(link = "logit"), control = list(maxit = 50000))
      if(AIC(model) < best_aic){
        best_model <- model
        best_aic <- AIC(model)
      }
    }
  }
  return(best_model)
}

f()

Thanks for your tips!

ekstroem
  • 5,957
  • 3
  • 22
  • 48
New2coding
  • 715
  • 11
  • 23
  • 2
    Please don't post code like `rm(list = ls())` unless it is crucial for your example. No one wants to accidentally copy/paste your example and run that line. – Gregor Thomas Aug 02 '17 at 16:43
  • You may want to look at the `findCorrelation` function in the `caret` package. Documentation to reference: https://topepo.github.io/caret/pre-processing.html#corr – jmuhlenkamp Dec 09 '17 at 04:21

0 Answers0