0

I want to extract stop words for several languages in one dplyr pipeline using this code:

    library(tidyverse)
    library(qdap)
    library(tm)
    map_dfr(tibble(language=c("english", "italian")), tm::stopwords)

Which gives me uninformative error message:

Error in file(con, "r") : invalid 'description' argument In addition: Warning message: In if (is.na(resolved)) kind else if (identical(resolved, "porter")) "english" else resolved : the condition has length > 1 and only the first element will be used

Can some one explain this and suggest work around. I would like to have tibble where each row corresponds to language title and respective list (vector) of stop words?

Alexander Borochkin
  • 4,249
  • 7
  • 38
  • 53
  • 1
    `map`ing over a data.frame will iterate over columns. Instead I *think* you want to map over the elements in language and return a list column for each. So `tibble(language=c("english", "italian")) %>% mutate(stop_words = map(language, tm::stopwords))` instead – Nate Aug 12 '19 at 13:05
  • 1
    or maybe `c("english", "italian") %>% set_names() %>% map_dfr(~tibble(stop_words = tm::stopwords(.)), .id = "lang")` for one big data frame – Nate Aug 12 '19 at 13:10
  • Both solutions worked, but in my case the first one was more suitable. – Alexander Borochkin Aug 12 '19 at 14:03

1 Answers1

2

It is not looping as intended. The unit here is a single column. We need to extract the column and loop

library(tidyverse)
out <- map(tibble(language=c("english", "italian"))$language, ~ tm::stopwords(.x))

Or another option is

tibble(language=c("english", "italian")) %>% 
   mutate(stop_words = Vectorize(stopwords)(language))
# A tibble: 2 x 2
#   language stop_words  
#  <chr>    <named list>
#1 english  <chr [174]> 
#2 italian  <chr [279]> 
akrun
  • 874,273
  • 37
  • 540
  • 662