I am working on unbalanced dataset and I noticed that strangely if I shuffle the data during cross validation I get a high value of the f1 score while if i do not shuffle it f1 is low. Here is the function I use for cross validation:
def train_cross_v(md,df_train,n_folds=5,shuffl=False):
X,y=df_train.drop([variable],axis=1),df_train[variable]
cv =StratifiedKFold(n_splits=n_folds,shuffle=shuffl)
scores = cross_val_score(md,X,y, scoring='f1', cv=cv, n_jobs=-1)
y_pred=cross_val_predict(md,X,y, cv=cv, n_jobs=-1)
print(' f1: ',scores,np.mean(scores))
print(confusion_matrix(y_pred,y))
return np.mean(scores)
Now shuffling I get f1 around 0.82:
nfolds=5
train_cross_v(XGBClassifier(),df_train,n_folds=nfolds,shuffl=True)
f1: [0.81469793 0.82076749 0.82726257 0.82379249 0.82484862] 0.8222738195197493
[[23677 2452]
[ 1520 9126]]
0.8222738195197493
While not shuffling leads to:
nfolds=5
train_cross_v(XGBClassifier(),df_train,n_folds=nfolds,shuffl=False)
f1: [0.67447073 0.55084022 0.4166443 0.52759421 0.64819164] 0.5635482198057791
[[21621 5624]
[ 3576 5954]]
0.5635482198057791
As I understand it, shuffling is preferred to assess the real performance of the model as it allows us to neglect any dependencies related to the ordering of the data, and usually the post shuffling value of the performance metric is lower than that without shuffling. In my case however the behavior is the exact opposite and I get a high value if I shuffle, and the values of the predictions on the test set remain unchanged. What could be the problem here?