I am performing k-fold validation on multiple datasets at the same time. I am using KFold
from sklearn
to do 10 fold validation. Basically this partitions a dataset into 10 pieces, and trains a classifier on 9 of those pieces then tests the results on the remaining 10th, then performs the same routine but switches the testing set to be a new 10th partition with the old testing set now included in the training set. I can write a for loop for a single dataset using the following:
for train, test in kfold.split(data):
print(train)
print(test)
The output of this is the following:
[1 2 3 4 5 6 7 8 9]
[0]
[0 2 3 4 5 6 7 8 9]
[1]
[0 1 3 4 5 6 7 8 9]
[2]
[0 1 2 4 5 6 7 8 9]
[3]
[0 1 2 3 5 6 7 8 9]
[4]
[0 1 2 3 4 6 7 8 9]
[5]
[0 1 2 3 4 5 7 8 9]
[6]
[0 1 2 3 4 5 6 8 9]
[7]
[0 1 2 3 4 5 6 7 9]
[8]
[0 1 2 3 4 5 6 7 8]
[9]
where the first array is the indices for the training set to be used from the initial dataset array of arrays and the second array is the index for the testing set. I can get this to properly iterate over one dataset. However, how would I go about performing this for multiple datasets simultaneously? For example, if I wanted to create a classifier using specific folds from multiple sets. I've tried the following:
for train0, test0, train1, test1 in kfold.split(data0), kfold.split(data1):
# code
But I get the following error: ValueError: too many values to unpack (expected 4)