dplyr 0.7 tidy eval: convert character variables to factors

Question

I have a dataset with many variables, some of them are character variables, which I would like to convert to factors. Since there are many variables to convert, I would like to do this using the new tidy eval functionality from dplyr_0.7. Here is a minimal example from my data:

data <- data.frame(factor1 = c("K", "V"), 
                   factor2 = c("E", "K"), 
                   other_var = 1:2, 
                   stringsAsFactors = FALSE)

I have a named list containing a data.frame for each variable which I want to convert. These data.frames in the list all have the same structure which can be seen in this example:

codelist_list <- list(factor1 = data.frame(Code = c("K", "V"), 
                                           Bezeichnung = c("Kauf", "Verkauf"), 
                                           stringsAsFactors = FALSE),
                      factor2 = data.frame(Code = c("E", "K"), 
                                           Bezeichnung = c("Eigengeschaeft", "Kundengeschaeft"), 
                                           stringsAsFactors = FALSE))

What I do not want to do is to define the factors like this for each variable:

mutate(df, factor1 = factor(factor1, 
                            levels = codelist_list[["factor1"]][["Code"]],
                            labels = codelist_list[["factor1"]][["Bezeichnung"]]))

What I have tried so far is the following:

convert_factors <- function(variable, df) {
  factor_variable <- enquo(variable)
  df %>% 
    mutate(!!quo_name(factor_variable) := factor(!!quo_name(factor_variable), 
                                                 levels = codelist_list[[variable]][["Code"]],
                                                 labels = codelist_list[[variable]][["Bezeichnung"]]))
}

In a first step, I want to check if my function convert_factors() works properly by calling convert_factors("factor1", data) which returns

  factor1 factor2 other_var
1    <NA>       E         1
2    <NA>       K         2

The variable does not show the value labels, but is replaced by NA instead.

The ultimate goal would be to map over all variables which I want to convert. Here, I tried map(c("factor1", "factor2"), convert_factors, df = data), which returned

Error in (function (x, strict = TRUE) : the argument has already been evaluated

I tried to follow the instructions from http://dplyr.tidyverse.org/articles/programming.html, but this is all I came up with.

Does anyone know where the problem is (and hopefully explain my error to me).

mt1022 · Answer 1 · 2017-08-08T15:30:42.817

I think you mixed up quosures and strings:

In you funtion,variable is a string, not an expression. So you should convert it to quo with rlang::sym, instead of enquo.
quo_name is used to convert an expression to string. As variable is already a string, you can directly do !!variable on rhs (right hand side) in mutate.
at lhs in mutate you need to unquo factor_variable with !! instead of trying to convert it to a string with quo_name.

After correcting for the above errors, you function will work:

convert_factors <- function(variable, df) {
    factor_variable <- rlang::sym(variable)
    df %>% 
        mutate(!!variable := factor(
            !!factor_variable, 
            levels = codelist_list[[variable]][["Code"]],
            labels = codelist_list[[variable]][["Bezeichnung"]]))
}

# > convert_factors('factor1', data)
#   factor1 factor2 other_var
# 1    Kauf       E         1
# 2 Verkauf       K         2

Here is what I try:

params <- lapply(codelist_list, setNames, nm = c('levels', 'labels'))

convert_factors <- function(variable, df) {
    factor_variable <- rlang::sym(variable)
    factor_param <- c(list(factor_variable), params[[variable]])

    df %>% mutate(!!variable := do.call(factor, factor_param))
}

convert_factors('factor1', data)
#   factor1 factor2 other_var
# 1    Kauf       E         1
# 2 Verkauf       K         2

Nice! I was also trying to answer the question but I didn't know about rlang::sym yet so it was too difficult to me. Nice that I got to learn that one, I was struggling with a similar thing today.. I was thinking about passing down the respective element of codelist as well to make the function self-contained but this is already a super nice solution so I'll rather continue with my MA thesis.. ;) — friep, Aug 08 '17 at 15:34
@friep, `sym` is a handy function to convert a string to a quo. `quo`,`enquo`, `quo_name`, are all from `rlang`. I learned this function from community. Good luck with your thesis. — mt1022, Aug 08 '17 at 16:00
Thanks for the answer. I did not know about `sym()` before. However, this is no solution for my "ultimate goal" as I stated it in the question. I overlooked that I needed `mutate_at` to convert all variables in the same `data.frame`, not one in each resulting one. — der_grund, Aug 09 '17 at 07:10

score 2 · Answer 2 · answered Aug 08 '17 at 16:58

2

Nice solution of mt1022 using tidy eval and dplyr. However, this task could be accomplished unsing only base-R:

data[,names(codelist_list)] <- lapply(names(codelist_list), function(x) 
  data[,x] <- factor(data[,x],
                     levels = codelist_list[[x]][["Code"]],
                     labels = codelist_list[[x]][["Bezeichnung"]]))

answered Aug 08 '17 at 16:58

MarkusN

3,051
1
18
26

Very nice! It was not apparent to me that the base-R notation can handle this without any specific add-ons. This looks very clean. – der_grund Aug 09 '17 at 06:06

aosmith · Accepted Answer · 2017-08-09T14:34:08.993

You could approach this with mutate_at, using the . coding within funs to apply a function to multiple columns at once.

This approach still involves using tidyeval to pull the correct list from codelist_list while referring to the variable via ..

mutate_at(data, c("factor1", "factor2"), 
          funs( factor(., levels = codelist_list[[quo_name(quo(.))]][["Code"]],
                      labels = codelist_list[[quo_name(quo(.))]][["Bezeichnung"]]) ) )

  factor1         factor2 other_var
1    Kauf  Eigengeschaeft         1
2 Verkauf Kundengeschaeft         2

If you wanted to make a function to pass to mutate_at, you can do so, with a few slight changes.

convert_factors = function(variable) {
     var2 = enquo(variable)
     factor(variable, levels = codelist_list[[quo_name(var2)]][["Code"]],
            labels = codelist_list[[quo_name(var2)]][["Bezeichnung"]]) 
}

mutate_at(data, c("factor1", "factor2"), convert_factors)

 factor1         factor2 other_var
1    Kauf  Eigengeschaeft         1
2 Verkauf Kundengeschaeft         2

aosmith, just for my own understanding: How would this look like, if I would like to define a function, which then would be called in `mutate_at`. Then, would one have to change `quo_name`... and so on? — der_grund, Aug 09 '17 at 07:06
@der_grund See edit for one option. The main switch is to using `enquo`. `quo_name` is still used to transform a quoted symbol to a string for pulling out the appropriate element from the list. — aosmith, Aug 09 '17 at 14:36

score 0 · Answer 4 · answered Aug 11 '17 at 08:24

Since you're just using strings and SE functions (the factor constructor), you don't need expressions or quosures. Just use name-unquoting with :=

convert_factors <- function(variable, df) {
  factor <- factor(variable,
    levels = codelist_list[[variable]][["Code"]],
    labels = codelist_list[[variable]][["Bezeichnung"]]
  )
  mutate(df, !! variable := factor)
}

map(c("factor1", "factor2"), convert_factors, df = data)

dplyr 0.7 tidy eval: convert character variables to factors

4 Answers4