Boxcox transformation with tree-based models(XGBoost to be specific)

Asked Aug 13 '20 at 09:14

Active Aug 13 '20 at 09:14

Viewed 507 times

I have a question regarding boxcox transformation(or log transformation). I am working on a data-set which I have lots of skewed features. Now when I take the boxcox transformation, I get quite a nice distribution but the thing is correlation decrease. Now if I was working with linear models I would just consider correlation to decide I should transform the feature or not. But as I mentioned I am working with tree-based models, so should I transform the feature to get a more dispersed distribution or I leave the feature as it is to avoid a decrease in correlation.

I add a screenshot of distribution and its relationship with the target variable, for both transformed and not transformed(Left 2 plots original feature and target).

PS: Guessing from the plots, it seems to me that if I transform the feature it will be easier for tree to find a split for this particular feature.

Thanks a lot,

asked Aug 13 '20 at 09:14

CheeseBurger

i m curious to know this too – Elvin Aghammadzada Jan 05 '21 at 04:13
In theory, decision tree based models should be able to handle skewed variables. The best way forward is to train the random forest with the skewed variable or with the box-transformed variable and see which one returns the best performing model. – Sole Galli Nov 29 '21 at 18:07

Boxcox transformation with tree-based models(XGBoost to be specific)

0 Answers0