Graph for Train and Validation set (overfitting?)

Question

I'm using xgb and have hypertuned my parameters using hyperopt, however when I plot the the train set and validation set after fitting my model, I noticed that the lines intersect with each other, what does that mean? Also the validation line doesn't start near the training line.

I'm using early_stopping_rounds = 20 when I fit my model prior to plotting this graph.

The hyperparameters I got from HyperOpt are as follows:

{'booster': 'gbtree', 'colsample_bytree': 0.8814444518931106, 'eta': 0.0712456143241873, 'eval_metric': 'ndcg', 'gamma': 0.8925113465433823, 'max_depth': 8, 'min_child_weight': 5, 'objective': 'rank:pairwise', 'reg_alpha': 2.2193560083517383, 'reg_lambda': 1.8600142721064354, 'seed': 0, 'subsample': 0.9818535865621624}

I thought Hyperopt should be giving me the best parameters. What can I possibly change to improve this?

Edit

I changed n_estimators from 527 to 160, and it is giving me this graph now. But I'm not sure if this graph is okay? Any advice is much appreciated!

Looks like the model is overfitting to the training data past the cross-over point. If this is as good as it gets after hyperparameter tuning, you probably need to stop training after 170-180 epochs. — NotAName, Sep 30 '21 at 06:53
@pavel Hi, thanks for responding to my post. eval_metric is ndcg, early_stopping_rounds is 20, initially the n_estimators I used from hyperopt was 527. I have attempted to change n_estimators to 160, now the two lines are touching each other at 160, for a good model, should both of the lines converge at the same point? Does it matter if my validation line doesn't start close to my training line? It seems like a very straight line. (I've also edited my post as per above) — Hojiyama, Sep 30 '21 at 07:02
Normally there should almost always be a crossover point. More info here: https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff#:~:text=In%20statistics%20and%20machine%20learning,bias%20in%20the%20estimated%20parameters.&text=The%20variance%20is%20an%20error,fluctuations%20in%20the%20training%20set. — NotAName, Sep 30 '21 at 07:31

Graph for Train and Validation set (overfitting?)

0 Answers0