I've been fighting trying to understand tidyeval
and the use of quo
, quos
, sym
, !!
, !!!
and the like. I made some attempts, but couldn't generalize my code so it accepts a vector of columns and applies text processing to those columns on a dataframe. My dataframe looks like this:
ocupation tasks id
Sink Cleaner Cleaning the sink 1
Lion petter Pet the lions 2
And my code looks like this:
stopwords_regex = paste(tm::stopwords('en'), collapse = '\\b|\\b')
stopwords_regex = glue('\\b{stopwords_regex}\\b')
df = df %>% mutate(ocupation_proc = ocupation %>% tolower() %>%
stringi::stri_trans_general("Latin-ASCII") %>%
str_remove_all(stopwords_regex) %>%
str_remove_all("[[:punct:]]") %>%
str_squish(),
tasks_proc = tasks %>% tolower() %>%
stringi::stri_trans_general("Latin-ASCII") %>%
str_remove_all(stopwords_regex) %>%
str_remove_all("[[:punct:]]") %>%
str_squish())
Which brings something like this:
ocupation tasks id ocupation_proc tasks_proc
Sink Cleaner Cleaning the sink 1 sink cleaner cleaning sink
Lion petter Pet the lions 2 lion petter pet lions
I'd like to turn this into a function process_text_columns(df, columns_list, new_col_names)
Where in this case df=df
, columns_list=c('ocupation', 'tasks')
and new_col_names=c('ocupation_proc', 'tasks_proc')
, (new_col_names
might not even be necessary if I can do something like glue({colname}_proc)
to name the new columns). From what I've gathered I'd need to use across
, sym
, quos
and maybe !!!
to generalize the function but anything I've tried has failed. Do you have any ideas?
Thanks