The last category I create with the function step_num2factor()
creates all levels correctly but the last one. There it fills in an NA.
MWE
test <- tibble(pred = c(0, 1, 2, 3, 4, 5, 8), target = c(0,1,0,1,1,1,0))
looks like this when printed:
# A tibble: 7 x 2
pred target
<dbl> <dbl>
1 0 0
2 1 1
3 2 0
4 3 1
5 4 1
6 5 1
7 8 0
Doing the recipe steps and comparing results
test <- tibble(pred = c(0, 1, 2, 3, 4, 5, 8), target = c(0,1,0,1,1,1,0))
my_levels <- c("zero", "one", "two", "three", "four", "five", "eight")
recipe(target ~ pred, data = test) %>%
step_num2factor(pred, levels = my_levels, transform = function(x) x + 1) %>%
prep(training = test) %>%
bake(new_data = test)
Remark: transform because of the level 0 which a factor cannot have. (source)
Transformed dataset after prepping and baking
# A tibble: 7 x 2
pred target
<fct> <dbl>
1 zero 0
2 one 1
3 two 0
4 three 1
5 four 1
6 five 1
7 NA 0
The NA is not supposed to be there. it is supposed to be category "eight". What am I doing wrong?
Remark: I tried it with "six" as well, as I thought maybe the function only accepts the values in words and not completely randomly named levels, but that wasn't it either.