6

Say, I am working on a machine learning model in R using naive bayes. So I would build a model using the naiveBayes package as follows

model <- naiveBayes(Class ~ ., data = HouseVotes84)

I can also print out the weights of the model by just printing the model.

And I do the prediction as follows, and this gives me one of the classes as the prediction

predict(model, HouseVotes84[1:10,], type = "raw")

However, my question is, is there a way to see which of the columns affected this prediction the most? So, I can get to know what are the most important contributing factors to a student failing the class, say, if that was the response variable, and the various possible factors were the other predictor columns.

My question is for any package in R, naiveBayes above is just an example.

alexwhitworth
  • 4,839
  • 5
  • 32
  • 59
saltandwater
  • 791
  • 2
  • 9
  • 25

1 Answers1

3

The answer depends on how you want to do the feature selection.

If it is part of the model building process and not some post-hoc analysis you could use caret with its feature selection wrapper methods to determine the best subset of features to model with recursive feature elmination, genetic algorithms etc, or filtering using univariate analysis.


If it is part of your post-hoc analysis based solely on your prediction. Then it depends on the type of model you have used. caret also supports this functionality for compatible models only!

For svm, with the exception of linear kernels, determining the importance of the coefficients is highly non-trivial. I'm unaware of any attempt to try to do some kind of feature ranking for svm in general regardless of language (please tell me if it does exist!!).

With rpart (as its tagged in the question) you can just visually look at the nodes. The higher the node the more important it is. This can be done in the caret package:

library(rpart)
library(caret)
fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
caret::varImp(fit)
#        Overall
#Age    5.896114
#Number 3.411081
#Start  8.865279

With naiveBayes you can see it from your model output. You just have to stare really hard:

data(HouseVotes84, package = "mlbench")
model <- naiveBayes(Class ~ ., data = HouseVotes84)
model
#
#Naive Bayes Classifier for Discrete Predictors
#
#Call:
#naiveBayes.default(x = X, y = Y, laplace = laplace)
#
#A-priori probabilities:
#Y
#  democrat republican 
# 0.6137931  0.3862069 
#
#Conditional probabilities:
#            V1
#Y                    n         y
#  democrat   0.3953488 0.6046512
#  republican 0.8121212 0.1878788
#
#            V2
#Y                    n         y
#  democrat   0.4979079 0.5020921
#  republican 0.4932432 0.5067568

A very brief glance shows that at least V1 looks like a better variable than V2.

Community
  • 1
  • 1
chappers
  • 2,415
  • 14
  • 16
  • thanks. But all of these are for a model as a whole. I am wanting to see what columns affect the prediction of a certain row in the test data set. I mean, say I use the prediction function against the model for a row in the test data set, and it gives me a certain prediction, I would like to see which columns led the most, to that being the prediction. The output (contributing factors should be different for each row) – saltandwater Dec 06 '15 at 22:43
  • This is definitely model dependent, for example, a model like naive bayes, you can figure it out again by looking at the model information as there is an assumption that each variable is independent, for decision trees you would have to figure it out on a case by case basis, as decision tree predictions are highly dependent on the "path" taken. Unfortunately I am not aware of some generic way to do this. Unless you're referring to simple univariate analysis. – chappers Dec 06 '15 at 23:03
  • Hi @chappers, I hope you are doing well, if you don't mind how did we know that V1 is a better variable than V2 in the naive Bayes classifier? Is it because 0.3953488 + 0.8121212 > 0.4979079 + 0.4932432? And if so what does n and y stand for here? Thanks in advance. – Blg Khalil Mar 30 '20 at 04:59
  • It appears that V1 discriminates better than V2. – chappers Apr 05 '20 at 10:44