4

I have a Gaussian naive bayes algorithm running against a dataset. What I need is to to get the feature importance (impactfulness of the features) on the target class.

Here's my code:

from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(inputs, target, test_size=0.2)

gaussian_nb = GaussianNB()
gaussian_nb.fit(X_train, y_train)
gaussian_nb.score(X_test, y_test)*100

And I tried:

importance = gaussian_nb.coefs_ # and even tried coef_

and it gives an error:

AttributeError: 'GaussianNB' object has no attribute 'coefs_'

Can someone please help me?

user13456401
  • 439
  • 1
  • 6
  • 12

2 Answers2

7

The GaussianNB does not offer an intrinsic method to evaluate feature importances. Naïve Bayes methods work by determining the conditional and unconditional probabilities associated with the features and predict the class with the highest probability. Thus, there are no coefficients computed or associated with the features you used to train the model (compare with its documentation).

That being said, there are methods that you can apply post-hoc to analyze the model after it has been trained. One of these methods is the Permutation Importance and it, conveniently, has also been implemented in scikit-learn. With the code you provided as a base, you would use permutation_importance the following way:

from sklearn.inspection import permutation_importance
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(inputs, target, test_size=0.2)

gaussian_nb = GaussianNB()
gaussian_nb.fit(X_train, y_train)

imps = permutation_importance(gaussian_nb, X_test, y_test)
print(imps.importances_mean)

Observe that the Permutation Importance is dataset dependent and you have to pass a dataset to obtain the values. This can be either the same data you used to train the model, i.e. X_train and y_train, or a hold-out set that you saved for evaluation, like X_test and y_test. The latter approach is but the superior choice in regard to generalization power.

If you want to know more about Permutation Importance as a method and how it works, then the user guide provided by scikit-learn is definitely a good start.

afsharov
  • 4,774
  • 2
  • 10
  • 27
  • Thank you for this workaround. And the values I get are all negative values as it seems. Can you describe what happens here? The target variable is a binary value (either `1` or `0`). So It outputs the importances like this: `[-0.00714286 -0.00571429 -0.00714286 -0.00428571 0. -0.00428571 -0.00571429 -0.00857143 -0.00428571 -0.00285714 0. 0.20428571 -0.00428571 0. -0.00428571 -0.00142857 -0.00571429 -0.00571429 0. -0.00571429 -0.00142857 -0.01142857 0. -0.00714286 -0.00142857 -0.00428571 -0.00285714 0. ]` what does this mean? – user13456401 Jul 16 '20 at 15:38
  • It is a convention in `scikit-learn` that **higher return values are better than lower return values**. The numbers here represent the mean difference in the score (here: accuracy) the algorithm determined when the values of a particular feature are randomly shuffled before obtaining the score. So for example, a value of `0.20` means that shuffling this feature resulted in a drop of 0.20 in accuracy. Hence, this feature is very important. The negative numbers of course mean the opposite: the accuracy actually increased when shuffling the corresponding feature, so they are not that important. – afsharov Jul 16 '20 at 16:26
  • Keep in mind though that these numbers are the calculated mean values of five iterations and that the negative numbers are quite close to 0. While it is not guaranteed that the model actually performs better without the features with negative values, it is fair to say that these features are not important in terms of Permutation Importance. – afsharov Jul 16 '20 at 16:31
  • Hi, you said `... feature resulted in a drop of 0.20 in accuracy. Hence, this feature is very important.`. So how come a feature that drops the accuracy is more important? and also you said `actually increased when shuffling the negative numbers`, so isn't it a good thing to have a increased accuracy? I am confused. – user13456401 Jul 17 '20 at 07:17
  • 1
    The idea behind Permutation Importance is that shuffling all values of a feature will break its relationship with the target variable. Thus, a model provided with a **shuffled feature**, which originally is indeed important, **should perform worse**. This makes sense, right? Now, the impact of this procedure is reported as a positive number in `scikit_learn` in order to conform to the above-mentioned convention. I hope this clarified things. **TL;DR: If _shuffling_ a feature made the model perform worse, it means this feature was important and thus, gets a positive value assigned to it.** – afsharov Jul 17 '20 at 09:27
0

If you have a look at the documentation, Naive Bayes does not have these attributes for feature importance. You can use get_params method for the priors learned, but not really individual features. If you need to understand feature importance, a good solution would be to to that analysis on something like a decision tree and then implement GaussianNB the using the most important features.

nickyfot
  • 1,932
  • 17
  • 25
  • So it is not a problem to get the feature importance from a different model and predict values with a different model? – user13456401 Jul 16 '20 at 11:53
  • it will not be exactly the same obviously, and it depends on why you are using the importance to begin with; I am only proposing this for your understanding and more as a guideline for feature selection (which is a valid technique). – nickyfot Jul 16 '20 at 12:01