I have a purely categorical data set, with a very imbalanced class weight (1:99).
I would like to train a model which will compute for each of the features and values of said feature, what importance it has on the prediction. So in essence to generate a dict like object:
vocabulary = {
'user=12345': 0,
'user=67890': 1,
'age=30': 2,
'age=40': 3,
'geo=UK': 4,
'geo=DE': 5,
'geo=US': 6,
'geo=BR': 7}
And to then attach to this a weight for importance:
weights = [.1, .2, .15, .25, .1, .1, .2, .2]
What python based machine learning library should I use, and what recommenadations for algorithms within the library which allow me to extract the above output.
I have tried; tensorflow linear regressor, scikit learn linear regressor & graphlab boosted trees. The boosted trees has seemed most promising but I would like to use an open source library if possible.
Thank you all very much in advance!
UPDATE:
GradientBoostingClassifier
yields a 0.999137901985
score due to the imbalanced classes.