0

I got into the same situation as the OP of this post. I would definitely prefer just to see the doc on how to extract the data from the xgb model and how exactly to code up its forward propagation, but m2cgen sounded like a good alternative. I used the following code

import xgboost as xgb
import seaborn as sns
import m2cgen as m2c

df = sns.load_dataset("diamonds")

X = df.drop(['cut', 'color', 'clarity', 'price'], axis = 1)
y = df.price

n = X.shape[0]
n_split = int(n*0.75)

model = xgb.XGBRegressor(objective ='reg:squarederror', 
                         max_depth = 2, 
                         n_estimators = 1,
                         eval_metric="rmse")
model.fit(X, y)

with open('./diamonds_model.c','w') as f:
    code = m2c.export_to_c(model)
    f.write(code)

and as a result I see

double score(double * input) {
    double var0;
    if (input[0] >= 0.995) {
        if (input[4] >= 7.1949997) {
            var0 = 3696.243;
        } else {
            var0 = 1841.0602;
        }
    } else {
        if (input[4] >= 5.535) {
            var0 = 922.34973;
        } else {
            var0 = 317.401;
        }
    }
    return nan + var0;
}

So I wonder what am I doing wrong and where does this nan come from. I'm on python 3.8.5 and xgb prints version 1.7.3

SBF
  • 345
  • 1
  • 2
  • 14

1 Answers1

1

The NaN component of the nan + var0 expression likely corresponds to XGBoost's base_score parameter (aka "global bias" aka intercept). It appears to be unset for your XGBRegressor instance, hence it is emitted as the default (NaN) value.

Try setting the base_score value manually to 0:

model = xgb.XGBRegressor(base_score = 0, ...)

This issue seems to affect newer XGBoost versions. See also m2cgen-581.

user1808924
  • 4,563
  • 2
  • 17
  • 20