I have a data set with a target variable of which some classes have only a few instances. I know that cross-validation might not be the best way to go, but I wonder how Weka handles this when using stratified k-fold cross-validation. Tried to search for the actual code here: http://grepcode.com/file/repo1.maven.org/maven2/nz.ac.waikato.cms.weka/weka-dev/3.7.6/weka/filters/supervised/instance/StratifiedRemoveFolds.java/ but I could not find it.
Example: Target variable has 3 classes, of which 2 have 50 instances and 1 has only 1. Stratify sampling tries to keep the class distribution the same, which is in this case impossible if we try 10-folds.
This might be a statexchange question, however I am not insterested in a statistical answer, just how the code works. For example using R with Rweka
require(RWeka)
iris_input <- iris[1:101,]
iris_fit <- J48(Species ~ ., data = iris_input, na.action = NULL)
evaluate_Weka_classifier(iris_fit,numFolds=10)
Hope my question is clear.
Might be linked to R: Cross validation on a dataset with factors