1

I'm trying to recode a large number of variables with 5 levels ("1_Disagree", "2_SomeD", "3_Neither", "4_SomeA", "5_Agree") into variables with 3 levels ("1_Disagree", "2_Neither", "3_Agree"). All these variables have similar names, so I'm using the across funtion from dplyr. Here's an exemple :


> df <- tibble(Q1_cat5 = as.factor(c("1_Disagree","2_SomeD","2_SomeD","4_SomeA","5_Agree")),
                  Q2_cat5 = as.factor(c("5_Agree","5_Agree","3_Neither","4_SomeA","5_Agree")),
                  Q3_cat5 = as.factor(c("3_Neither","2_SomeD","2_SomeD","1_Disagree","5_Agree")))

> df
# A tibble: 5 × 3
  Q1_cat5    Q2_cat5   Q3_cat5   
  <fct>      <fct>     <fct>     
1 1_Disagree 5_Agree   3_Neither 
2 2_SomeD    5_Agree   2_SomeD   
3 2_SomeD    3_Neither 2_SomeD   
4 4_SomeA    4_SomeA   1_Disagree
5 5_Agree    5_Agree   5_Agree  

What I'm trying to obtain:

> df2
# A tibble: 5 × 6
  Q1_cat5    Q2_cat5   Q3_cat5    Q1_cat3    Q2_cat3   Q3_cat3   
  <fct>      <fct>     <fct>      <fct>      <fct>     <fct>     
1 1_Disagree 5_Agree   3_Neither  1_Disagree 3_Agree   2_Neither 
2 2_SomeD    5_Agree   2_SomeD    1_Disagree 3_Agree   1_Disagree
3 2_SomeD    3_Neither 2_SomeD    1_Disagree 2_Neither 1_Disagree
4 4_SomeA    4_SomeA   1_Disagree 3_Agree    3_Agree   1_Disagree
5 5_Agree    5_Agree   5_Agree    3_Agree    3_Agree   3_Agree  

As you can see, the new variables work as follow:

  • If Q1_cat5 = "1_Disagree" or "2_SomeD" then Q1_cat3 = "1_Disagree"
  • If Q1_cat5 = "3_Neither" then Q1_cat3 = "2_Neither"
  • If Q1_cat5 = "4_SomeA" or "5_Agree" then Q1_cat3 = "3_Agree"

I've tried the following code:

df2 <- df %>% mutate(across(.cols = starts_with('Q') & ends_with('cat5'),
                                 .funs = case_when(                                
                                    (. == "1_Disagree" | . == "2_SomeD") ~ '1_Disagree',
                                    . == "3_Neither" ~ '2_Neither',
                                    (. == "4_SomeA" |. == "5_Agree") ~ '3_Agree',
                                    is.na(.) ~ NA,
                                    ),
                                 .names = '{str_sub(.col,1,-5)}cat3'
                                 )
                        )

Which indeed creates new variables Q1_cat3, Q2_cat3, etc... But it keeps the old values of Q1_cat5, Q2_cat5, etc... So instead of what I want, it duplicates the old variables and just rename them:

> df2
# A tibble: 5 × 6
  Q1_cat5    Q2_cat5   Q3_cat5    Q1_cat3    Q2_cat3   Q3_cat3   
  <fct>      <fct>     <fct>      <fct>      <fct>     <fct>     
1 1_Disagree 5_Agree   3_Neither  1_Disagree 5_Agree   3_Neither 
2 2_SomeD    5_Agree   2_SomeD    2_SomeD    5_Agree   2_SomeD
3 2_SomeD    3_Neither 2_SomeD    2_SomeD    3_Neither 2_SomeD
4 4_SomeA    4_SomeA   1_Disagree 4_SomeA    4_SomeA   1_Disagree
5 5_Agree    5_Agree   5_Agree    5_Agree    5_Agree   5_Agree  

Even after doing a lot of research and trying several other solutions, I can't figure out why this isn't working, nor can I find another solution to effectively do what I want. I've other post about "case_when" with "across" but none of the solutions work for me. Could you help me?

camille
  • 16,432
  • 18
  • 38
  • 60
Vetepi
  • 11
  • 2

1 Answers1

1

Firstly, across has an argument .fns not .funs. However, the main issue is that you're trying to pass a lambda function without using the necessary operator such as tilde (~) in tidyverse. Try with:

df2 <- df %>% 
  mutate(
    across(.cols = starts_with('Q') & ends_with('cat5'),
           ~ case_when(
             (. == "1_Disagree" | . == "2_SomeD") ~ '1_Disagree',
             . == "3_Neither" ~ '2_Neither',
             (. == "4_SomeA" |. == "5_Agree") ~ '3_Agree',
             is.na(.) ~ NA_character_ # You can skip this part though
             ),
           .names = '{str_sub(.col,1,-5)}cat3')
    )

Output:

df2

# A tibble: 5 x 6
  Q1_cat5    Q2_cat5   Q3_cat5    Q1_cat3    Q2_cat3   Q3_cat3   
  <fct>      <fct>     <fct>      <chr>      <chr>     <chr>     
1 1_Disagree 5_Agree   3_Neither  1_Disagree 3_Agree   2_Neither 
2 2_SomeD    5_Agree   2_SomeD    1_Disagree 3_Agree   1_Disagree
3 2_SomeD    3_Neither 2_SomeD    1_Disagree 2_Neither 1_Disagree
4 4_SomeA    4_SomeA   1_Disagree 3_Agree    3_Agree   1_Disagree
5 5_Agree    5_Agree   5_Agree    3_Agree    3_Agree   3_Agree   

As you can see, instead of only NA you'll also need to specify NA_character_ as all values need to be of same type, including NA. I am not sure about your use case though, normally you could skip the last step as anything not fitting the previously described rules will be NA anyhow.

arg0naut91
  • 14,574
  • 2
  • 17
  • 38