Strange behavior in dplyr when mapping language vector on tm::stopwords

Question

I want to extract stop words for several languages in one dplyr pipeline using this code:

    library(tidyverse)
    library(qdap)
    library(tm)
    map_dfr(tibble(language=c("english", "italian")), tm::stopwords)

Which gives me uninformative error message:

Error in file(con, "r") : invalid 'description' argument In addition: Warning message: In if (is.na(resolved)) kind else if (identical(resolved, "porter")) "english" else resolved : the condition has length > 1 and only the first element will be used

Can some one explain this and suggest work around. I would like to have tibble where each row corresponds to language title and respective list (vector) of stop words?

`map`ing over a data.frame will iterate over columns. Instead I *think* you want to map over the elements in language and return a list column for each. So `tibble(language=c("english", "italian")) %>% mutate(stop_words = map(language, tm::stopwords))` instead — Nate, Aug 12 '19 at 13:05
or maybe `c("english", "italian") %>% set_names() %>% map_dfr(~tibble(stop_words = tm::stopwords(.)), .id = "lang")` for one big data frame — Nate, Aug 12 '19 at 13:10
Both solutions worked, but in my case the first one was more suitable. — Alexander Borochkin, Aug 12 '19 at 14:03

akrun · Answer 1 · 2019-08-12T14:06:27.273

It is not looping as intended. The unit here is a single column. We need to extract the column and loop

library(tidyverse)
out <- map(tibble(language=c("english", "italian"))$language, ~ tm::stopwords(.x))

Or another option is

tibble(language=c("english", "italian")) %>% 
   mutate(stop_words = Vectorize(stopwords)(language))
# A tibble: 2 x 2
#   language stop_words  
#  <chr>    <named list>
#1 english  <chr [174]> 
#2 italian  <chr [279]>

Strange behavior in dplyr when mapping language vector on tm::stopwords

1 Answers1