2

I have a question regarding boxcox transformation(or log transformation). I am working on a data-set which I have lots of skewed features. Now when I take the boxcox transformation, I get quite a nice distribution but the thing is correlation decrease. Now if I was working with linear models I would just consider correlation to decide I should transform the feature or not. But as I mentioned I am working with tree-based models, so should I transform the feature to get a more dispersed distribution or I leave the feature as it is to avoid a decrease in correlation.

I add a screenshot of distribution and its relationship with the target variable, for both transformed and not transformed(Left 2 plots original feature and target).

PS: Guessing from the plots, it seems to me that if I transform the feature it will be easier for tree to find a split for this particular feature.

Thanks a lot,

Original Feature - BoxCox Transformed Feature

CheeseBurger
  • 175
  • 5
  • i m curious to know this too – Elvin Aghammadzada Jan 05 '21 at 04:13
  • In theory, decision tree based models should be able to handle skewed variables. The best way forward is to train the random forest with the skewed variable or with the box-transformed variable and see which one returns the best performing model. – Sole Galli Nov 29 '21 at 18:07

0 Answers0