0

I have a SGDClassifier model trained with scikit-learn. I extract features names with .get_feature_names() and coefficients with .coef_

I combine the 2 columns in a dataframe like this :

feature     value
hiroshima   3.918584
wildfire    3.287680
earthquake  3.256817
massacre    3.186762
storm       3.124809
...         ...
job         -1.696438
song        -1.736640   
as          -1.956571   
nowplaying  -2.028240   
write       -2.263968

I want to know how I can interpret the features importances ? What does a positive high value mean? What does a low negative value mean?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Ahmed K
  • 21
  • 2
  • 1
    I’m voting to close this question because it is not about programming as defined in the [help] but about ML theory and/or methodology. – desertnaut Mar 11 '21 at 00:56

1 Answers1

1

SGDClassifier fits a linear model, meaning that the decision is essentially based on

SUM_i w_i f_i + b

where w_i is the weight attached to feature f_i, consequently you can interpret these numbers as literally "votes" for positive/negative class at the scale proportional to their absolute value. All that your classifier does is to add these weights, and then it adds _intercept value from your model, and classifies based on the sign.

lejlot
  • 64,777
  • 8
  • 131
  • 164