0

I am working on creating a Random Forest model using the tidymodels approach. In the recipe function, I get this error/warning that I simply cannot interpret, but it must be something related to the summary variables created. The error is

There are new levels in a factor: NA 

So, for now my recipe arguments looks like this:

era.af.Al_rec <- recipes::recipe(logRR ~., data = era.af.Al_predict) %>%
  step_mutate_at(logRR, SubPrName, PrName, Product, AEZ16simple, fn = factor) %>%
  update_role(ID, new_role = "ID") %>%
  update_role(RR_group, new_role = "ID") %>%
  step_other(SubPrName, PrName, Product, AEZ16simple, threshold = 0.01) %>%
  step_other(Site.ID, threshold = 0.001) %>%
  step_dummy(all_nominal(), -all_outcomes()) %>%
  step_downsample(logRR)


era.af.Al_prep <- prep(era.af.Al_rec)

juiced <- juice(era.af.Al_prep)  

The error pops up in the _prep call.

kangaroo_cliff
  • 6,067
  • 3
  • 29
  • 42
  • Could you add a `dput` of the data used to train the model? – NelsonGon Jun 14 '21 at 02:32
  • Hi Nelson, Could you be a little more precise please. Thank you – Kamau Lindhardt Jun 14 '21 at 15:19
  • 1
    It is hard to say what is going on without example data, but one possible problem is that you may be creating dummy variables for your `"ID"` variables. Maybe try `step_dummy(all_nominal_predictors())`. Also consider creating a [reprex](https://reprex.tidyverse.org/) so we can offer more meaningful help. – Julia Silge Jun 17 '21 at 23:21

1 Answers1

1

I had a similar case. This is a warning ejected by step_dummy(), warning me that a new factor level was added, in this case the new level is NA. This is coming from missing data in the categorical columns.
I could simply ignore the error, since this is what I wanted; adding a new NA level to the dummy encoding. Alternatively, I could add a step_unknown() step before the step_dummy().

step_unknown(all_nominal_predictors(), new_level = "NA")

hnagaty
  • 796
  • 5
  • 13