1

I am performing k-fold validation on multiple datasets at the same time. I am using KFold from sklearn to do 10 fold validation. Basically this partitions a dataset into 10 pieces, and trains a classifier on 9 of those pieces then tests the results on the remaining 10th, then performs the same routine but switches the testing set to be a new 10th partition with the old testing set now included in the training set. I can write a for loop for a single dataset using the following:

for train, test in kfold.split(data):
    print(train)
    print(test)

The output of this is the following:

[1 2 3 4 5 6 7 8 9]
[0]
[0 2 3 4 5 6 7 8 9]
[1]
[0 1 3 4 5 6 7 8 9]
[2]
[0 1 2 4 5 6 7 8 9]
[3]
[0 1 2 3 5 6 7 8 9]
[4]
[0 1 2 3 4 6 7 8 9]
[5]
[0 1 2 3 4 5 7 8 9]
[6]
[0 1 2 3 4 5 6 8 9]
[7]
[0 1 2 3 4 5 6 7 9]
[8]
[0 1 2 3 4 5 6 7 8]
[9]

where the first array is the indices for the training set to be used from the initial dataset array of arrays and the second array is the index for the testing set. I can get this to properly iterate over one dataset. However, how would I go about performing this for multiple datasets simultaneously? For example, if I wanted to create a classifier using specific folds from multiple sets. I've tried the following:

for train0, test0, train1, test1 in kfold.split(data0), kfold.split(data1):
    # code

But I get the following error: ValueError: too many values to unpack (expected 4)

desertnaut
  • 57,590
  • 26
  • 140
  • 166
mtrns
  • 73
  • 6

1 Answers1

3

You could use parallel iteration with zip:

for kfold0, kfold1 in zip(kfold.split(data0), kfold.split(data1)):
   train0, test0 = kfold0
   train1, test1 = kfold1
   ...

You can even unpack the tuples straight in the loop, though I personally find this less readable:

for (train0, test0), (train1, test1) in zip(kfold.split(data0), kfold.split(data1)):
   ...
NPE
  • 486,780
  • 108
  • 951
  • 1,012