4

I've been reproducing Julia Silge's code from his Youtube video of Sentiment Analysis with tidymodels for Animal Crossing user reviews (https://www.youtube.com/watch?v=whE85O1XCkg&t=1300s). In minute 25, she uses tune_grid(), and when I try to use it in my script, I have this warning/error: Warning message: All models failed in tune_grid(). See the .notes column.

In .notes, appears 25 times:

[[1]]
# A tibble: 1 x 1
.notes                                                                         
<chr>                                                                          
1 "recipe: Error in UseMethod(\"prep\"): no applicable method for 'prep' applied~

How can I dix this? I'm using the same code that Julia uses. My entire code is this:

library(tidyverse)

user_reviews <- read_tsv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-05/user_reviews.tsv")

user_reviews %>%
  count(grade) %>%
  ggplot(aes(grade,n)) + 
  geom_col()

user_reviews %>%
  filter(grade > 0) %>%
  sample_n(5) %>% 
  pull(text)

reviews_parsed <- user_reviews %>%
  mutate(text = str_remove(text, "Expand"), 
         rating = case_when(grade > 6 ~ "Good", TRUE ~ "Bad"))

library(tidytext)

words_per_review <- reviews_parsed %>% 
  unnest_tokens(word,text) %>%
  count(user_name, name = "total_words", sort = TRUE)

words_per_review %>%
  ggplot(aes(total_words)) + 
  geom_histogram()

library(tidymodels)

set.seed(123)
review_split <- initial_split(reviews_parsed, strata = rating)
review_train <- training(review_split)
review_test <- testing(review_split)

library(textrecipes)

review_rec <- recipe(rating ~ text, data = review_train) %>% 
  step_tokenize(text) %>%
  step_stopwords(text) %>%
  step_tokenfilter(text, max_tokens = 500) %>%
  step_tfidf(text) %>%
  step_normalize(all_predictors())

review_prep <- prep(review_rec)

review_prep

juice(review_prep)

lasso_spec <- logistic_reg(penalty = tune(), mixture = 1) %>%
  set_engine("glmnet")

lasso_wf <- workflow() %>%
  add_recipe(review_rec) %>%
  add_model(lasso_spec)

lasso_wf

lambda_grid <- grid_regular(penalty(), levels = 30)

set.seed(123)
review_folds <- bootstraps(review_train, strata = rating)

review_folds

doParallel::registerDoParallel()

set.seed(2020)

lasso_grid <- tune_grid(lasso_wf, resamples = review_folds, grid = lambda_grid, metrics = metric_set(roc_auc, ppv, npv))

lasso_grid

Warning message:
All models failed in tune_grid(). See the `.notes` column. 

lasso_grid$.notes

[[1]]
# A tibble: 1 x 1
  .notes                                                                         
  <chr>                                                                          
1 "recipe: Error in UseMethod(\"prep\"): no applicable method for 'prep' applied~

[[2]]
# A tibble: 1 x 1
  .notes                                                                         
  <chr>                                                                          
1 "recipe: Error in UseMethod(\"prep\"): no applicable method for 'prep' applied~

[[3]]
# A tibble: 1 x 1
  .notes                                                                         
  <chr>                                                                          
1 "recipe: Error in UseMethod(\"prep\"): no applicable method for 'prep' applied~

etc... to 25.
makux_gcf
  • 69
  • 6
  • Are you working in Windows? This looks like the error for parallel processing not being set up correctly on Windows. [Here](https://stackoverflow.com/questions/45819337/option-cores-from-package-doparallel-useless-on-windows) are [three](https://privefl.github.io/blog/a-guide-to-parallelism-in-r/) resources to get [started](https://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf) with parallel processing on Windows. – Julia Silge May 15 '20 at 03:11

1 Answers1

6

Found a solution in the comments section of the post. This worked for me (Windows user) and made grid tuning nearly 4x faster.

all_cores <- parallel::detectCores(logical = FALSE)
library(doParallel)
cl <- makePSOCKcluster(all_cores)
registerDoParallel(cl)

set.seed(2020)
lasso_grid <- tune_grid(
  lasso_wf,
  resamples = review_folds,
  grid = lambda_grid,
  metrics = metric_set(roc_auc, ppv, npv),
  control = control_grid(pkgs = c('textrecipes'))
)

Additional documentation can also be found here and here.

Desmond
  • 1,047
  • 7
  • 14