Is there a new way to estimate STMs of varying topic numbers with furrr now?

Question

A year ago, I fitted an stm using code from a Julia Silge blog post. Code is posted below.

After some updates to the furrr package, the code no longer runs and, when it doesn't throw an error, it crashes my computer. Is anyone aware of a workaround?

(Note: same thing is happening to colleagues using a similar procedure on different datasets.)

The multiprocess function was phased out, so I switched to multisession. I also switched from a K=(20,40,60,80,100) to a continuous K = (5:100). This has made it so that the code runs, but it also crashes my computer.

replication code:

df <- read_rds("data/df.Rds")

tidy_forum <- df %>%
  unnest_tokens(word, text_noquote, token = "tweets") %>%
  anti_join(get_stopwords()) %>%
  filter(!str_detect(word, "[0-9]+")) %>%
  add_count(word)
#we use "tweets" here because it's a good workhorse token for processing all kinds of forum data

forum_sparse <- tidy_forum %>%
  count(id, word) %>%
  cast_sparse(id, word, n)

#breaking point
plan(multisession)
many_models <- tibble(K = 5:100) %>%
  mutate(topic_model = future_map(K, ~stm(forum_sparse, K=K), 
                                  seed = TRUE))

Blog post I referenced originally: https://juliasilge.com/blog/evaluating-stm/

Is there a new way to estimate STMs of varying topic numbers with furrr now?

0 Answers0