How to use across and mutate across an entire dataset that has multiple column types?

Question

I'm trying to use dplyr's across and case_when across my entire dataset, so whenever it sees "Strongly Agree" it changes it to a numeric 5, "Agree" to a numeric 4, and so on. I've tried looking at this answer, but I'm getting an error because my dataset has logical and numeric columns and R rightfully says that "Agree" can't be in a logical column, etc.

Here's my data:

library(dplyr)
test <- tibble(name = c("Justin", "Corey", "Sibley"),
               date = c("2021-08-09", "2021-10-29", "2021-01-01"),
               s1 = c("Agree", "Neutral", "Strongly Disagree"),
               s2rl = c("Agree", "Neutral", "Strongly Disagree"),
               f1 = c("Strongly Agree", "Disagree", "Strongly Disagree"),
               f2rl = c("Strongly Agree", "Disagree", "Strongly Disagree"),
               exam = c(90, 99, 100),
               early = c(TRUE, FALSE, FALSE))

Ideally, I'd like one command that would allow me to go across the entire dataset. However, if that can't be done, I'd like to have one argument that would allow me to use multiple across(contains()) arguments (i.e., here contains "s" or "f").

Here's what I've tried already to no avail:

library(dplyr)
test %>%
  mutate(across(.), 
         ~ case_when(. == "Strongly Agree" ~ 5, 
                     . == "Agree" ~ 4,
                     . == "Neutral" ~ 3,
                     . == "Disagree" ~ 2,
                     . == "Strongly Disagree" ~ 1,
                     TRUE ~ NA))

Error: Problem with `mutate()` input `..1`.
x Must subset columns with a valid subscript vector.
x Subscript has the wrong type `tbl_df<
  name: character
  date: character
  s1  : character
  s2rl: character
  f1  : character
  f2rl: character
  exam: double
>`.
ℹ It must be numeric or character.
ℹ Input `..1` is `across(.)`.

akrun · Accepted Answer · 2021-08-17T18:23:50.167

We can use matches to pass regex

library(dplyr)
test %>% 
    mutate(across(matches('^(s|f)'), ~ case_when(. == "Strongly Agree" ~ 5, 
                     . == "Agree" ~ 4,
                     . == "Neutral" ~ 3,
                     . == "Disagree" ~ 2,
                     . == "Strongly Disagree" ~ 1,
                     TRUE ~ NA_real_)))

-output

# A tibble: 3 x 8
  name   date          s1  s2rl    f1  f2rl  exam early
  <chr>  <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
1 Justin 2021-08-09     4     4     5     5    90 TRUE 
2 Corey  2021-10-29     3     3     2     2    99 FALSE
3 Sibley 2021-01-01     1     1     1     1   100 FALSE

According to ?across

across() makes it easy to apply the same transformation to multiple columns, allowing you to use select() semantics inside in "data-masking" functions like summarise() and mutate().

If we check the ?select, it returns with the various select-helpers used for selecting columns which can be used in across as well

Tidyverse selections implement a dialect of R where operators make it easy to select variables:

: for selecting a range of consecutive variables.

! for taking the complement of a set of variables.

& and | for selecting the intersection or the union of two sets of variables.

c() for combining selections.

In addition, you can use selection helpers. Some helpers select specific columns:

everything(): Matches all variables.

last_col(): Select last variable, possibly with an offset.

These helpers select variables by matching patterns in their names:

starts_with(): Starts with a prefix.

ends_with(): Ends with a suffix.

contains(): Contains a literal string.

matches(): Matches a regular expression.

num_range(): Matches a numerical range like x01, x02, x03.

These helpers select variables from a character vector:

all_of(): Matches variable names in a character vector. All names must be present, otherwise an out-of-bounds error is thrown.

any_of(): Same as all_of(), except that no error is thrown for names that don't exist.

This helper selects variables with a function:

where(): Applies a function to all variables and selects those for which the function returns TRUE.

thanks! Out of curiosity, why does "matches" work but across(.) does not. Is it because you're overlooking those unrelated columns and not tripping R up? Just want to make sure I understand for the future. — J.Sabree, Aug 17 '21 at 18:03
@J.Sabree Are you looking for a solution or try to make the `across(.)` work which is wrong — akrun, Aug 17 '21 at 18:08
no I've accepted your answer--I'm just curious why the across(.) was wrong. — J.Sabree, Aug 17 '21 at 18:21
@J.Sabree I updated the answer. The idea is to pass some input so that it selects certain columns or all of them. Use either select-helpers or column names as string or unquoted etc as in the updated — akrun, Aug 17 '21 at 18:24

score 3 · Answer 2 · answered Aug 17 '21 at 18:46

We could do it also otherway round. First use just character 5 as "5" and so on... In this case we have to use NA_character_ which is NA for character type At the end use type.convert(as.is = TRUE) to get integers:

library(dplyr)
test %>%
    mutate(across(s1:f2rl, 
           ~ case_when(. == "Strongly Agree" ~ "5", 
                       . == "Agree" ~ "4",
                       . == "Neutral" ~ "3",
                       . == "Disagree" ~ "2",
                       . == "Strongly Disagree" ~ "1",
                       TRUE ~ NA_character_ ))) %>% 
    type.convert(as.is = TRUE)

# A tibble: 3 x 8
  name   date          s1  s2rl    f1  f2rl  exam early
  <chr>  <chr>      <int> <int> <int> <int> <int> <lgl>
1 Justin 2021-08-09     4     4     5     5    90 TRUE 
2 Corey  2021-10-29     3     3     2     2    99 FALSE
3 Sibley 2021-01-01     1     1     1     1   100 FALSE

How to use across and mutate across an entire dataset that has multiple column types?

2 Answers2