0

I can see the method 'createDataPartition' can split the data based in the outcome variable:

https://topepo.github.io/caret/data-splitting.html#outcome

This same applies on 'createFolds', I think.

But I'm trying to use stratified k-folding (The folds are made by preserving the percentage of samples for each class in target) when calling 'trainControl' with 'cv' method. I don't see a parameter to specify this. There is a possibility of giving the folds indexes as a parameter, but I suppose this shouldn't work with 'repeatedcv' (as, in each repeat, it needs/creates new folds)

Does it use stratified k-folding by default? What if I need shuffle, instead?

Thanks

Kaikus
  • 1,001
  • 4
  • 14
  • 26

1 Answers1

0

Finally, I got the solution thanks my ML teacher, who led me to this answer :)

As stated here:

Caret Package: Stratified Cross Validation in Train Function

You can use createFolds to create the indexes for folding. In case you need to use 'repeatedcv', you must create the indexes with createMultiFolds.

IMPORTANT!: Setting index in trainControl invalidates repeats/numbers arguments https://github.com/topepo/caret/issues/584

Kaikus
  • 1,001
  • 4
  • 14
  • 26