0

I am trying out the customized loss function for quantile regression with XGBoost from https://gist.github.com/Nikolay-Lysenko/06769d701c1d9c9acb9a66f2f9d7a6c7 which is as follows:

import numpy as np


def xgb_quantile_eval(preds, dmatrix, quantile=0.2):
    """
    Customized evaluational metric that equals
    to quantile regression loss (also known as
    pinball loss).
    Quantile regression is regression that
    estimates a specified quantile of target's
    distribution conditional on given features.
    @type preds: numpy.ndarray
    @type dmatrix: xgboost.DMatrix
    @type quantile: float
    @rtype: float
    """
    labels = dmatrix.get_label()
    return ('q{}_loss'.format(quantile),
            np.nanmean((preds >= labels) * (1 - quantile) * (preds - labels) +
                       (preds < labels) * quantile * (labels - preds)))


def xgb_quantile_obj(preds, dmatrix, quantile=0.2):
    """
    Computes first-order derivative of quantile
    regression loss and a non-degenerate
    substitute for second-order derivative.
    Substitute is returned instead of zeros,
    because XGBoost requires non-zero
    second-order derivatives. See this page:
    https://github.com/dmlc/xgboost/issues/1825
    to see why it is possible to use this trick.
    However, be sure that hyperparameter named
    `max_delta_step` is small enough to satisfy:
    ```0.5 * max_delta_step <=
       min(quantile, 1 - quantile)```.
    @type preds: numpy.ndarray
    @type dmatrix: xgboost.DMatrix
    @type quantile: float
    @rtype: tuple(numpy.ndarray)
    """
    try:
        assert 0 <= quantile <= 1
    except AssertionError:
        raise ValueError("Quantile value must be float between 0 and 1.")

    labels = dmatrix.get_label()
    errors = preds - labels

    left_mask = errors < 0
    right_mask = errors > 0

    grad = -quantile * left_mask + (1 - quantile) * right_mask
    hess = np.ones_like(preds)

    return grad, hess

I have been getting errors when trying to fit the model (after running xgb_r.fit(train_X, train_y)).

If I assign the variables as follows: X = df[['var1','var2', 'var3','var4','var5']]

I get this error: AttributeError: 'numpy.ndarray' object has no attribute 'get_label'

If the variables are assigned like this: X = pd.DataFrame(np.c_[df['var1'], df['var2'], df['var3'], df['var4'], df['var5']], columns=['var1','var2', 'var3','var4','var5'])

Then I get this: ValueError: DataFrame.dtypes for data must be int, float, bool or category. When categorical type is supplied, The experimental DMatrix parameter 'enable_categorical' must be set to 'True'. Invalid columns:var1: object, var2: object, var3: object, var4: object, var5: object

In any case, df.dtypes shows that all variables I am using are either int64 or float64. Any advice on how to fix this will be great

So maybe yet another way of assigning variables is needed.

Nata
  • 21
  • 1
  • 2

0 Answers0