how to capture logic from case_when in dplyr

Question

I am using case_when() from dplyr to create the following column, result.

z <- tibble(a = c(40, 30, NA), 
       b = c(NA, 20, 10))


z %>%
          mutate(result = case_when(
                    !is.na(a) ~ a,
                    is.na(a) & !is.na(b) ~ b
          )
          )

The above gives the following:

      a     b result
  <dbl> <dbl>  <dbl>
1    40    NA     40
2    30    20     30
3    NA    10     10

However, I would like to simultaneously create another column, result_logic, which displays where the value in result is pulling from (either a or b). The output would look like this.

      a     b result result_logic
  <dbl> <dbl>  <dbl>        <chr>
1    40    NA     40          a
2    30    20     30          a
3    NA    10     10          b

Is there any way to capture this logic evaluated in case_when()?

Thanks

I think you'd need to do two logic checks, as `mutate` is creating a single variable each time. It'd be easy to do both checks in one `mutate` function (adding in the answer below) - is there a particular reason you want to get two column outputs from one `case_when` test? — Andy Baxter, Dec 20 '21 at 20:01

PaulS · Accepted Answer · 2021-12-20T20:15:19.700

6

Something like the following?

library(tidyverse)

z <- tibble(a = c(40, 30, NA), 
            b = c(NA, 20, 10))

z %>%
  mutate(result = case_when(
    !is.na(a) ~ str_c(a, "a", sep = " "),
    is.na(a) & !is.na(b) ~ str_c(b, "b", sep = " "))) %>% 
  separate(result, into=c("result", "result_logic"), convert = T)

#> # A tibble: 3 × 4
#>       a     b result result_logic
#>   <dbl> <dbl>  <int> <chr>       
#> 1    40    NA     40 a           
#> 2    30    20     30 a           
#> 3    NA    10     10 b

edited Dec 20 '21 at 20:15

answered Dec 20 '21 at 20:01

PaulS

21,159
2
9
26

1

Thanks. I was trying to avoid two different "case_when()" calls. In my real dataset, my case_when has many many more "cases" - if there is a way to do with a single "case_when()" would be much simpler... if its possible – mdb_ftl Dec 20 '21 at 20:03
@mdb_ftl: I have updated my solution, which now uses only a single `case_when`. I hope it helps you! – PaulS Dec 20 '21 at 20:16

score 2 · Answer 2 · answered Dec 20 '21 at 20:24

Here is an alternative approach dplyr only:

library(dplyr)

z %>% 
  mutate(result = case_when(
    !is.na(a) ~ a, 
    is.na(a) & !is.na(b) ~ b),
    across(-result, ~case_when(
    !is.na(.) ~ cur_column()), .names = 'new_{col}'),
    result_logic = coalesce(new_a, new_b), .keep="unused")

  a     b result result_logic
  <dbl> <dbl>  <dbl> <chr>       
1    40    NA     40 a           
2    30    20     30 a           
3    NA    10     10 b

score 1 · Answer 3 · answered Dec 20 '21 at 20:15

You could possibly reverse the two steps above and get the second to 'simply' choose the selected value. This would involve only one case_when call:

library(tidyverse)

z <- tibble(a = c(40, 30, NA), 
            b = c(NA, 20, 10))

z %>% 
  mutate(result_logic = case_when(
    !is.na(a) ~ "a",
    is.na(a) & !is.na(b) ~ "b"
  ),
  result = map2_dbl(row_number(), result_logic, ~ z[[.x, .y]]))

#> # A tibble: 3 x 4
#>       a     b result_logic result
#>   <dbl> <dbl> <chr>         <dbl>
#> 1    40    NA a                40
#> 2    30    20 a                30
#> 3    NA    10 b                10

^{Created on 2021-12-20 by the reprex package (v2.0.1)}

IceCreamToucan · Answer 4 · 2021-12-20T20:33:47.870

library(dplyr, warn.conflicts = FALSE)
z <- tibble(a = c(40, 30, NA), 
       b = c(NA, 20, 10))

z %>% 
  mutate(
    result = do.call(coalesce, across(a:b)),
    result_logic = 
      do.call(coalesce,
        across(a:b, ~ ifelse(is.na(.), NA, cur_column())))
  )
#> # A tibble: 3 × 4
#>       a     b result result_logic
#>   <dbl> <dbl>  <dbl> <chr>       
#> 1    40    NA     40 a           
#> 2    30    20     30 a           
#> 3    NA    10     10 b

^{Created on 2021-12-20 by the reprex package (v2.0.1)}

how to capture logic from case_when in dplyr

4 Answers4