0

I have a recipe with the step_mutate() function in between, performing text data transformations on titanic dataset, supported by the stringr package.

library(tidyverse)
library(tidymodels)

extract_title <- function(x) stringr::str_remove(str_extract(x, "Mr\\.? |Mrs\\.?|Miss\\.?|Master\\.?"), "\\.")

rf_recipe <- 
  recipe(Survived ~ ., data = titanic_train) %>% 
  step_impute_mode(Embarked) %>% 
  step_mutate(Cabin = if_else(is.na(Cabin), "Yes", "No"),
              Title = if_else(is.na(extract_title(Name)), "Other", extract_title(Name))) %>% 
  step_impute_knn(Age, impute_with = c("Title", "Sex", "SibSp", "Parch")) %>% 
  update_role(PassengerId, Name, new_role = "id")

This set of transformations works perfectly well with rf_recipe %>% prep() %>% bake(new_data = NULL).

When I try to fit a random forests model with hyperparameter tunning and 10-fold cross validation within a workflow, all models fail. The output of the .notes columns explicitly says that there was a problem with mutate() column Title: couldn't find the function str_remove().

doParallel::registerDoParallel()
rf_res <- 
  tune_grid(
    rf_wf,
    resamples = titanic_folds,
    grid = rf_grid,
    control = control_resamples(save_pred = TRUE)
  )

As this post suggests I've explicitly told R that str_remove should be found in stringr package. Why this isn't working and what could be causing it?

dzegpi
  • 554
  • 5
  • 14

2 Answers2

0

I don't think this will fix the error, but just in case the str_extract function is not written stringr :: str_extract, did you load the package?

  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Oct 22 '21 at 18:12
0

The error shows up because step_knn_impute() and subsequently the gower::gower_topn function transforms all characters to factors. To overcome this issue I had to apply prep()and bake() functions, without the inclusion of the recipe in the workflow.

prep_recipe <- prep(rf_recipe)  
train_processed <- bake(prep_recipe, new_data = NULL)
test_processed <- bake(prep_recipe, new_data = titanic_test %>%
                         mutate(across(where(is.character), as.factor)))

Now the models converge.

dzegpi
  • 554
  • 5
  • 14