0

I have a dataset 8100 observations of 118 variables that are used to determine which one of 4 groups each respondent falls into. I am interested in which variables are the most important for predicting group membership. My data is a combination of ordinal and binary. I initially did a discriminant function analysis, but then read that this does not handle binary data well. Next I tried a multinomial logistic regression. However, from here I am struggling to work out which variables are the most important. I had tried an r-part decision tree, but then I read that these are not very stable, and indeed, when I ran it on a random half of my data I got different results every time. Now I am trying a dominance analysis. I can get it working for a linear model (lm), but for both the multinomial logistic regression and the discriminant function analysis I get the error:

Error in daRawResults(x = x, constants = constants, terms = terms, fit.functions = fit.functions,  : 
  Not implemented method to retrieve data from model

Does anyone have any advice for what else I can try? Only 4 of the 118 variables are binary, so I can remove them if needed and will still have a good analysis.

Here is a reproducible example including a much smaller example dataset:

set.seed(1)  ## for reproducibility

remotes::install_github("clbustos/dominanceAnalysis") # If you don't have the dominance analysis package
library(dominanceanalysis)
library(MASS)
library(nnet)

mydata <- data.frame(Segments=sample(1:4, 15, replace=TRUE),
                     var1=sample(1:7, 15, replace=TRUE),
                     var2=sample(1:7, 15, replace=TRUE),
                     var3=sample(1:6, 15, replace=TRUE),
                     var4=sample(1:2, 15, replace=TRUE))

# Show that it works for a linar model
LM<-lm(Segments ~., mydata)
da.LM<-dominanceAnalysis(LM);da.LM
#var1 is the most important, followed by var4

# Try the discriminant function analysis
DFA <- lda(Segments~., data=mydata)
da.DFA <- dominanceAnalysis(DFA)
# Error

# Try multinomial logistic regression
MLR <- multinom(Segments ~ ., data = mydata, maxit=500)
da.MLR <- dominanceAnalysis(MLR)
# Error
thestral
  • 491
  • 1
  • 4
  • 16
  • Two people have voted to close my question. May I please know why? Have I posted it in the wrong place? Is there other information that I should have included? – thestral Jan 31 '23 at 22:09

1 Answers1

0

I've discovered a partial answer.

The dominanceanalysis package can only be used on these models: Ordinary Least Squares, Generalized Linear Models, Dynamic Linear Models and Hierarchical Linear Models.

Source: https://github.com/clbustos/dominanceAnalysis

This explains why it didn't work for my data - I wasn't using those models.

I have decided to pursue the decision tree option of variable selection by using a random forest.

thestral
  • 491
  • 1
  • 4
  • 16