0

I have a multiclass classification problem (with 10 classes)that I am trying to solve using the neural network option 'mxnet' in the caret package in R. I'm using a 10-fold cross validation during training and would like to plot a learning curve for this to figure out whether/how the model is overfitting. I have modified the solution given in this post (Plot learning curves with caret package and R) to fit my data. However, since the learning curve is being recorded over each one of the resamples, not all factors/classes (1-10) are present in each fold, which leads to the following error:

Error: One or more factor levels in the outcome has no data

I have also tried to use the builtin function of caret with learning_curve_dat, but I encounter the same error message.

Is there a way to bypass this problem of not all factors being present in each one of the folds?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Leyla
  • 17
  • 1
  • I found this question with similar topic regarding factors in CV: https://stackoverflow.com/questions/19946930/r-cross-validation-on-a-dataset-with-factors. Would stratified sampling be useful for you? https://gist.github.com/Bergvca/c1df8e579005e3cd82e8d3c8b009403a – Jonny Phelps Aug 26 '18 at 13:13
  • Thanks for the suggestions @JonnyPhelps! I was already using a stratified random split with createDataPartition to create a training and test set, but the problem seems to occur during the 10-fold cross-validation. I have also implemented a stratified cross-validation with createFolds( ) to assure that all classes are represented in each fold, but it doesn't seem to solve the problem. – Leyla Sep 04 '18 at 16:00
  • @EkabaBisong I'm using your solution from the post linked above, but I can't seem to get it to work for a multiclass problem. Do you have a suggestion on how to get it to work? – Leyla Sep 04 '18 at 16:27
  • If there are some tiny classes, could look at upSampling your data. https://rdrr.io/cran/caret/man/downSample.html. Here is a good tutorial I read about unbalanced data: https://shiring.github.io/machine_learning/2017/04/02/unbalanced – Jonny Phelps Sep 05 '18 at 10:28

0 Answers0