Errors using customized loss function for quantile regression with XGBoost

Question

I am trying out the customized loss function for quantile regression with XGBoost from https://gist.github.com/Nikolay-Lysenko/06769d701c1d9c9acb9a66f2f9d7a6c7 which is as follows:

import numpy as np


def xgb_quantile_eval(preds, dmatrix, quantile=0.2):
    """
    Customized evaluational metric that equals
    to quantile regression loss (also known as
    pinball loss).
    Quantile regression is regression that
    estimates a specified quantile of target's
    distribution conditional on given features.
    @type preds: numpy.ndarray
    @type dmatrix: xgboost.DMatrix
    @type quantile: float
    @rtype: float
    """
    labels = dmatrix.get_label()
    return ('q{}_loss'.format(quantile),
            np.nanmean((preds >= labels) * (1 - quantile) * (preds - labels) +
                       (preds < labels) * quantile * (labels - preds)))


def xgb_quantile_obj(preds, dmatrix, quantile=0.2):
    """
    Computes first-order derivative of quantile
    regression loss and a non-degenerate
    substitute for second-order derivative.
    Substitute is returned instead of zeros,
    because XGBoost requires non-zero
    second-order derivatives. See this page:
    https://github.com/dmlc/xgboost/issues/1825
    to see why it is possible to use this trick.
    However, be sure that hyperparameter named
    `max_delta_step` is small enough to satisfy:
    ```0.5 * max_delta_step <=
       min(quantile, 1 - quantile)```.
    @type preds: numpy.ndarray
    @type dmatrix: xgboost.DMatrix
    @type quantile: float
    @rtype: tuple(numpy.ndarray)
    """
    try:
        assert 0 <= quantile <= 1
    except AssertionError:
        raise ValueError("Quantile value must be float between 0 and 1.")

    labels = dmatrix.get_label()
    errors = preds - labels

    left_mask = errors < 0
    right_mask = errors > 0

    grad = -quantile * left_mask + (1 - quantile) * right_mask
    hess = np.ones_like(preds)

    return grad, hess

I have been getting errors when trying to fit the model (after running xgb_r.fit(train_X, train_y)).

If I assign the variables as follows: X = df[['var1','var2', 'var3','var4','var5']]

I get this error: AttributeError: 'numpy.ndarray' object has no attribute 'get_label'

If the variables are assigned like this: X = pd.DataFrame(np.c_[df['var1'], df['var2'], df['var3'], df['var4'], df['var5']], columns=['var1','var2', 'var3','var4','var5'])

Then I get this: ValueError: DataFrame.dtypes for data must be int, float, bool or category. When categorical type is supplied, The experimental DMatrix parameter 'enable_categorical' must be set to 'True'. Invalid columns:var1: object, var2: object, var3: object, var4: object, var5: object

In any case, df.dtypes shows that all variables I am using are either int64 or float64. Any advice on how to fix this will be great

So maybe yet another way of assigning variables is needed.

Errors using customized loss function for quantile regression with XGBoost

0 Answers0