0

during a model creation with R and tidymodels, as I'm using bootstrap validation as training strategy, I've found that sometimes tune_grid() function returns this error:

! Bootstrap01: recipe 2/3, model  4/10 (predictions): There are new
levels in a factor: ...

I understand that this is due to some factor levels missing between train and validation subset. I also know that this probably means that should be better to collapse low frequency levels.

However I've just wondering if could be possible force to pre-create dummy variables from a list of levels. At the moment I've tried with:

step_string2factor(my_factor_variable, levels = list("A","B","C") )

or

step_dummy(my_factor_variable,levels = levels = list("A","B","C") )

But without luck. Any suggestion?

Paul
  • 8,734
  • 1
  • 26
  • 36
Ilproff_77
  • 207
  • 1
  • 3
  • 17
  • Can you add a [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example)? – Paul Oct 22 '20 at 11:51
  • Perhaps `recipes::step_novel ()` could be what you need. Its documentation states that "step_novel creates a specification of a recipe step that will assign a previously unseen factor level to a new value" – hnagaty Oct 22 '20 at 16:31
  • @Hany, as my pndestranding step_novel() just add the same level to unknown ones. My original question is how to maintain the information even in case for unknown case as, in the cross-fold validation and small data-set this could lead to a training problem. – Ilproff_77 Oct 23 '20 at 13:21
  • It really would be helpful to see a [reprex](https://stackoverflow.com/help/reprex) in this situation because it is possible that best practice would be to set factor levels before training/testing split, or otherwise ensure that the resamples are all getting the same factor levels without this kind of forcing. – Julia Silge Oct 25 '20 at 21:01
  • Whooa ... an answer from @JuliaSilge (thanks for your YouTube videos). I think that set-up a reprex it's out of my knowledge as, I have to share my data-set. For the sake of knowledge pre-set all the levels is what I've try to achieve as, I've already tried to resample with strata=my_factor_variable but seems that this is not a suitable strategy. – Ilproff_77 Nov 03 '20 at 13:00
  • Here are some helpful [reprex do's and don'ts](https://reprex.tidyverse.org/articles/reprex-dos-and-donts.html) to get you started @llproff_77! – Julia Silge Nov 04 '20 at 15:34

0 Answers0