1

is there a simple way to divide a dataset into 5 subsets of the same size? Also taking into account the percentage of class distribution? It should be possible to access each subset directly.

Many thanks

Code Now
  • 711
  • 2
  • 9
  • 20

1 Answers1

0

Are you talking about Kfold? scikit-learn StratifiedKFold is a variation of KFold. The folds are made by preserving the percentage of samples for each class.

Noki
  • 870
  • 10
  • 22
  • I would like to do a 5-fold cross validation with EarlyStopping. I would like to train a model 5times with EarlyStopping and each time the validation data set should be disjoint to the previous one. In this way I want to avoid that the model is indirectly learning the validation data. The whole thing should take place in a for loop. I don't know now how I can take each of the individual folds when using StratifiedKFold in order to pass them as validation data. – Code Now Jan 21 '20 at 21:20
  • Yes you can @CodeNow! Checkout this answer, where they explain how to work with the chunks created by the Kfold class. You can work individually with each chunk :) https://stackoverflow.com/a/48641547/5963546 – Noki Jan 22 '20 at 11:08