3

In my regression model, I have created dummy variables for all binary variables in my data set. When I extract the feature importances from my model (XGBoost regression model) and plot them, I have a feature importance for all dummy variables as well (GENDER1, GENDER2, ADULT1, ADULT2, etc.).

What is the actual feature importance of the variable GENDER and ADULT in this example? Can I simply take the average of both importances, or is this mathematically wrong?

Peter Lawrence
  • 719
  • 2
  • 10
  • 20
  • I guess you're splitting by groups of ages... Why not look at the gain of each of them and address them as a separate features? If you must include them all, I would address their "gain" as the average of "gains". – Eran Moshe Mar 08 '18 at 12:55

0 Answers0