1

I'm using the XGBoost model and I am having some troubles generalizing my model. I tried to visualize the learning curves of my train and test sets. However, both are exactly the same. It looks like an error to me, but I do not know the reason.

The code:

X, y = creation_X_y()
data_dmatrix = xgb.DMatrix(data=X, label=y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=42, stratify=y)

model = xgb.XGBClassifier(n_estimators=50, max_depth=6, params=params, scale_pos_weight=len(y_train.value_counts().loc[0]/y_train.value_counts().loc[1]))

eval_set = [(X_train, y_train), (X_test, y_test)]

model.fit(X_train, 
          y_train, 
          eval_metric="logloss", 
          eval_set=eval_set, 
          verbose=False)

results = model.evals_result()

epochs = len(results['validation_0']['logloss'])
x_axis = range(0, epochs)

fig, ax = plt.subplots()
ax.plot(x_axis, results['validation_0']['logloss'], label='Train')
ax.plot(x_axis, results['validation_1']['logloss'], label='Test')
ax.legend()
plt.xlabel('\nEpochs',fontsize=14,fontweight='semibold',color='white')
plt.ylabel('Error\n',fontsize=14,fontweight='semibold',color='white')
plt.title('XGBoost learning curve\n',fontsize=20,fontweight='semibold',color='white')
plt.xticks(color='white')
plt.yticks(color='white')
plt.show()

The output:

enter image description here

Flavia Giammarino
  • 7,987
  • 11
  • 30
  • 40
  • Are they just close enough to be indistinguishable on the plot, or actually the same; can you compare the results lists directly? How big is your dataset? – Ben Reiniger Jun 24 '22 at 14:48

0 Answers0