I have a dataset where I find that the dependent (target) variable has a skewed distribution - i.e. there are a few very large values and a long tail.
When I run the regression tree, one end-node is created for the large-valued observations and one end-node is created for majority of the other observations.
Would it be ok to log transform the dependent (target) variable and use it for regression tree analysis ? When I tried this, I get a different set of nodes and splits that seem to have a more even distribution of observations in each bucket. With log transformation, the Rsquare value for Predicted vs. Observed is also quite good. In other words, I seem to get better testing and validation performance with log transformation. Just want to make sure log transformation is an accepted way to run regression tree when the dependent variable has a skewed distribution.
Thanks !