Dealing with overfitting problem
I try to perform xgboost classification with Stratified Cross Validation and SMOTE sampling, and also randomized search. The result of cross validation XGBoost is great, but when I test it with testing dataset, the result is very bad.
accuracy_lst_xgb = []
precision_lst_xgb = []
recall_lst_xgb = []
f1_lst_xgb = []
auc_lst_xgb = []
xgb_sm = xgb.XGBClassifier(random_state = 42)
xgb_params = {'eta' : [0.0001,0.00001,0.000001], # Learning rate
'n_estimators' : [10,20,30],
'eval_metric': ['logloss'],
'max_depth' : [1,2,3,4],
'lambda' : [2,2.5], # L2 regularization (higher values make model more conservative)
'alpha' : [1,1.5]} # L1 regularization (higher values make model more conservative)
rand_xgb = RandomizedSearchCV(xgb_sm, xgb_params, n_iter=5)
pipeline_xgb = imbalanced_make_pipeline(SMOTE(sampling_strategy='minority'), rand_xgb) # SMOTE happens during Cross Validation not before..
for train, val in sss.split(X_train_balanced_sm, y_train_balanced_sm):
#pipeline_xgb = imbalanced_make_pipeline(SMOTE(sampling_strategy='minority'), rand_xgb) # SMOTE happens during Cross Validation not before..
pipeline_xgb.fit(X_train_balanced_sm[train], y_train_balanced_sm[train])
best_est_xgb = rand_xgb.best_estimator_
prediction_xgb = pipeline_xgb.predict(X_train_balanced_sm[val])
accuracy_lst_xgb.append(pipeline_xgb.score(X_train_balanced_sm[val], y_train_balanced_sm[val]))
precision_lst_xgb.append(precision_score(y_train_balanced_sm[val], prediction_xgb))
recall_lst_xgb.append(recall_score(y_train_balanced_sm[val], prediction_xgb))
f1_lst_xgb.append(f1_score(y_train_balanced_sm[val], prediction_xgb))
auc_lst_xgb.append(roc_auc_score(y_train_balanced_sm[val], prediction_xgb))
print('Result of cross validation XGBoost on reduced dataset:')
print("accuracy: {}".format(np.mean(accuracy_lst_xgb)))
print("precision: {}".format(np.mean(precision_lst_xgb)))
print("recall: {}".format(np.mean(recall_lst_xgb)))
print("f1: {}".format(np.mean(f1_lst_xgb)))
print("ROC AUC: {}".format(np.mean(auc_lst_xgb)))
prediction = pipeline_xgb.predict(X_test_balanced)
accuracy_test = accuracy_score(y_test_balanced, prediction)
precision_test = precision_score(y_test_balanced, prediction)
recall_test = recall_score(y_test_balanced, prediction)
f1_test = f1_score(y_test_balanced, prediction)
auc_test = roc_auc_score(y_test_balanced, prediction)
print("Test Accuracy:", accuracy_test)
print("Test Precision:", precision_test)
print("Test Recall:", recall_test)
print("Test F1-score:", f1_test)
print("Test AUC:", auc_test)`
Result of cross validation XGBoost on reduced dataset:
accuracy: 0.8552028218694886
precision: 0.8452956959367612
recall: 0.8724952086097598
f1: 0.8584596570871824
ROC AUC: 0.8550763330018207
Test Accuracy: 0.492046077893582
Test Precision: 0.492046077893582
Test Recall: 1.0
Test F1-score: 0.6595588235294118
Test AUC: 0.5
Looking for solution to deal with the overfitting problem.