Is normalization (or scaling) useful for regression with Gradient Tree Boosting?

Question

I read that normalization is not required when using gradient tree boosting (see e.g. Should I need to normalize (or scale) the data for Random forest (drf) or Gradient Boosting Machine (GBM) in H2O or in general?, https://github.com/dmlc/xgboost/issues/357).

And I think I understand that in principle there is no need for normalization when boosting regression trees.

Nevertheless, using xgboost for regression trees, I see that scaling the target has a significant impact on the (in-sample) error of the prediction result. What is the reason for this?

Example for the Boston Housing dataset:

import numpy as np
import xgboost as xgb
from sklearn.metrics import mean_squared_error
from sklearn.datasets import load_boston

boston = load_boston()
y = boston['target']
X = boston['data']

for scale in np.logspace(-6, 6, 7):
    xgb_model = xgb.XGBRegressor().fit(X, y / scale)
    y_predicted = xgb_model.predict(X) * scale
    print('{} (scale={})'.format(mean_squared_error(y, y_predicted), scale))

2.3432734454908335 (scale=1e-06)
2.343273977065266 (scale=0.0001)
2.3432793874455315 (scale=0.01)
2.290595204136888 (scale=1.0)
2.528513393507719 (scale=100.0)
7.228978353091473 (scale=10000.0)
272.29640759874474 (scale=1000000.0)

The impact of scaling y becomes really big when using 'reg:gamma' as objective function (instead of the default 'reg:linear'):

for scale in np.logspace(-6, 6, 7):
    xgb_model = xgb.XGBRegressor(objective='reg:gamma').fit(X, y / scale)
    y_predicted = xgb_model.predict(X) * scale
    print('{} (scale={})'.format(mean_squared_error(y, y_predicted), scale))

591.6509503519147 (scale=1e-06)
545.8298971540023 (scale=0.0001)
37.68688286293508 (scale=0.01)
4.039819858716935 (scale=1.0)
2.505477263590776 (scale=100.0)
198.94093800190453 (scale=10000.0)
592.1469169959003 (scale=1000000.0)

I copied the question to stats.stackexchange.com. See https://stats.stackexchange.com/q/364827/219262 — kadee, Aug 31 '18 at 11:20
Some algorithms don't need scale or normalization. From my experience with xgb, Scale nor Normalization was ever being needed, nor did it improve my results. When doing Logistic Regression, Normalization or Scale can help you get an Optimize solution faster, (for SGD approach). I think PCA and t-SNE are sensitive for Scale and Normalization as well... But not XGB. — Eran Moshe, Sep 02 '18 at 05:32

Is normalization (or scaling) useful for regression with Gradient Tree Boosting?

0 Answers0