Understanding the Substantial Performance Discrepancy between Stratified K-Fold Cross Validation and No Cross Validation in my Prediction

Question

: I have developed two versions of my code where one incorporates stratified k-fold cross validation, while the other lacks any form of cross validation. To my surprise, the results achieved using stratified k-fold cross validation significantly outperform those obtained without any cross validation. I would appreciate guidance in identifying potential implementation errors or other factors contributing to this pronounced discrepancy. with cross validation:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import roc_auc_score, accuracy_score, confusion_matrix, classification_report, roc_curve
from sklearn.model_selection import StratifiedKFold
from catboost import CatBoostClassifier

# Convert x_train and y_train to numpy arrays if they are not already
x_train = np.array(x_train)
y_train = np.array(y_train)

# Define the number of folds
n_splits = 5

# Initialize lists to store the evaluation metrics for each fold
auc_scores = []
accuracy_scores = []

# Initialize StratifiedKFold
skf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=42)

# Perform stratified k-fold cross-validation
for train_index, test_index in skf.split(x_train, y_train):
    # Split the data into train and test sets for the current fold
    x_train_fold, x_test_fold = x_train[train_index], x_train[test_index]
    y_train_fold, y_test_fold = y_train[train_index], y_train[test_index]
    
    # Create the CatBoostClassifier
    catboost = CatBoostClassifier(n_estimators=300, random_state=42, silent=True, learning_rate=0.1, max_depth=7)

    # Fit the CatBoostClassifier
    catboost.fit(x_train_fold, y_train_fold)
    
    # Make predictions on the test data
    catboost_pred = catboost.predict_proba(x_test_fold)[:, 1]
    
    # Calculate AUC for the current fold
    auc = roc_auc_score(y_test_fold, catboost_pred)
    
    # Convert probabilities to binary predictions
    y_pred = (catboost_pred > 0.5).astype(int)
    


    
    
    
    # Generate classification report for the current fold
    classification_rep = classification_report(y_test_fold, y_pred)
    
    # Display the evaluation metrics for the current fold
    print("Fold AUC:", auc)
  
    print("Fold Classification Report:\n", classification_rep)
    print("--------------------")

  0       0.77      0.87      0.82     14125
  1       0.85      0.74      0.79     14125

accuracy                      0.81     28250

without cross validation

from catboost import CatBoostClassifier
from sklearn.metrics import roc_auc_score, accuracy_score, confusion_matrix, classification_report

# Create the CatBoostClassifier
catboost = CatBoostClassifier(n_estimators=300, random_state=42, silent=True, learning_rate=0.1, max_depth=7)

# Fit the CatBoostClassifier
catboost.fit(x_train, y_train)

# Make predictions on the test data
catboost_pred = catboost.predict_proba(x_test)[:, 1]

# Calculate AUC
auc = roc_auc_score(y_test, catboost_pred)

# Convert probabilities to binary predictions
y_pred = (catboost_pred > 0.5).astype(int)


# Generate classification report
classification_rep = classification_report(y_test, y_pred)

# Display the AUC, precision, recall, specificity, sensitivity, accuracy, and classification report
print("AUC:", auc)
print("Classification Report:\n", classification_rep)

       0       0.83      0.87      0.85     17551
       1       0.39      0.33      0.36      4545

accuracy                           0.76     22096

score 0 · Answer 1 · answered Jun 12 '23 at 08:20

0

You should check your dataset class balance. I think your performance delta comes from there since you have a lower precision and recall for class 1 in your without cross validation test case than the stratified k-fold cross validation one.

answered Jun 12 '23 at 08:20

JP Marcel

56
1
3

score 0 · Answer 2 · answered Jun 12 '23 at 08:22

This is not suprising the least cross validation acts as a form of regularization, and stratified cross validation helps reducing class imbalance in your training. However there are two issues in your code. You're not using cross validation correctly.

Replace this line with :

y_pred += (catboost_pred > 0.5).astype(int)/n_splits

And also remove your classification report out of the for loop, unless you want to get individual performance of each model training on each split:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import roc_auc_score, accuracy_score, confusion_matrix, classification_report, roc_curve
from sklearn.model_selection import StratifiedKFold
from catboost import CatBoostClassifier

# Convert x_train and y_train to numpy arrays if they are not already
x_train = np.array(x_train)
y_train = np.array(y_train)

# Define the number of folds
n_splits = 5

# Initialize lists to store the evaluation metrics for each fold
auc_scores = []
accuracy_scores = []

# Initialize StratifiedKFold
skf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=42)

# Perform stratified k-fold cross-validation
for train_index, test_index in skf.split(x_train, y_train):
    # Split the data into train and test sets for the current fold
    x_train_fold, x_test_fold = x_train[train_index], x_train[test_index]
    y_train_fold, y_test_fold = y_train[train_index], y_train[test_index]
    
    # Create the CatBoostClassifier
    catboost = CatBoostClassifier(n_estimators=300, random_state=42, silent=True, learning_rate=0.1, max_depth=7)

    # Fit the CatBoostClassifier
    catboost.fit(x_train_fold, y_train_fold)
    
    # Make predictions on the test data
    catboost_pred = catboost.predict_proba(x_test_fold)[:, 1]
    
    # Calculate AUC for the current fold
    auc = roc_auc_score(y_test_fold, catboost_pred)
    
    # Convert probabilities to binary predictions
    y_pred += (catboost_pred > 0.5).astype(int)/n_splits
    
 
# Generate classification report for the current fold
classification_rep = classification_report(y_test_fold, y_pred)
        
# Display the evaluation metrics for the current fold
print("Fold AUC:", auc)

This way you're creating n_splits models that are all contributing to the prediction output, and it acts as an ensemble technique. This might increase your metrics even more.

Understanding the Substantial Performance Discrepancy between Stratified K-Fold Cross Validation and No Cross Validation in my Prediction

2 Answers2