Creating K dataframe using train_index, test_index of Kfold cross validation in Python using sklearn.cross_validation.KFold()

Question

I am using 5 fold cross validation in python using sklearn.cross_validation.KFold() to see how my model performs. It is performing well on 4 folds and very poor performance on one specific fold. As i am new to the Data Science I was wondering how i can retrieve the data from one particular fold so that i can see the data from that set and figure out how to fix it.

What libraries are you using? And what language too? Is to R or python? You did not specify either on the tags. — ItIsEntropy, Dec 11 '19 at 09:47
My apologies. I am using scikit-learn Library and Python language — Anjani Kumar Tiwari, Dec 11 '19 at 10:44
Please, add the code of the approach you have tried to your question, it's a good practice, and also, it helps other users to find a solution to your issue — henriquehbr, Dec 11 '19 at 11:53

score 0 · Accepted Answer · answered Dec 11 '19 at 11:52

It's easy. There is just an example from the Sklearn documentation for K-Folds:

X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]]) # create an array
y = np.array([1, 2, 3, 4]) # Create another array
kf = KFold(n_splits=2) # Define the split - into 2 folds 

for train_index, test_index in kf.split(X):
 print(“TRAIN:”, train_index, “TEST:”, test_index)
 X_train, X_test = X[train_index], X[test_index]
 y_train, y_test = y[train_index], y[test_index]

('TRAIN:', array([2, 3]), 'TEST:', array([0, 1]))
('TRAIN:', array([0, 1]), 'TEST:', array([2, 3]))

You have to print also your performance computed in each step.

score 0 · Answer 2 · answered Dec 16 '19 at 07:06

0

from pandas import ExcelWriter
from sklearn.model_selection import KFold
kf = KFold(n_splits=3)
fold = 0
writer = ExcelWriter('Kfoldcrossvalidation.xlsx')
for train_index, test_index in kf.split(X2):
    fold += 1
    print("Fold: %s" % fold)
    X_train, X_test = X50.iloc[train_index], X50.iloc[test_index]
    y_train, y_test = Y.iloc[train_index], Y.iloc[test_index]
    print(y_test)
    y_test.to_excel(writer,sheet_name='sheet '  + str(fold))
writer.save()

answered Dec 16 '19 at 07:06

Anjani Kumar Tiwari

3
3

Is there any way i can do the above code without writing in excel and directly creating a dataframe using some kind of loop? – Anjani Kumar Tiwari Dec 16 '19 at 07:08

Creating K dataframe using train_index, test_index of Kfold cross validation in Python using sklearn.cross_validation.KFold()

2 Answers2