2

I am using case_when() from dplyr to create the following column, result.

z <- tibble(a = c(40, 30, NA), 
       b = c(NA, 20, 10))


z %>%
          mutate(result = case_when(
                    !is.na(a) ~ a,
                    is.na(a) & !is.na(b) ~ b
          )
          )  

The above gives the following:

      a     b result
  <dbl> <dbl>  <dbl>
1    40    NA     40
2    30    20     30
3    NA    10     10   

However, I would like to simultaneously create another column, result_logic, which displays where the value in result is pulling from (either a or b). The output would look like this.

      a     b result result_logic
  <dbl> <dbl>  <dbl>        <chr>
1    40    NA     40          a
2    30    20     30          a
3    NA    10     10          b

Is there any way to capture this logic evaluated in case_when()?

Thanks

AndrewGB
  • 16,126
  • 5
  • 18
  • 49
mdb_ftl
  • 423
  • 2
  • 14
  • I think you'd need to do two logic checks, as `mutate` is creating a single variable each time. It'd be easy to do both checks in one `mutate` function (adding in the answer below) - is there a particular reason you want to get two column outputs from one `case_when` test? – Andy Baxter Dec 20 '21 at 20:01

4 Answers4

6

Something like the following?

library(tidyverse)

z <- tibble(a = c(40, 30, NA), 
            b = c(NA, 20, 10))

z %>%
  mutate(result = case_when(
    !is.na(a) ~ str_c(a, "a", sep = " "),
    is.na(a) & !is.na(b) ~ str_c(b, "b", sep = " "))) %>% 
  separate(result, into=c("result", "result_logic"), convert = T)

#> # A tibble: 3 × 4
#>       a     b result result_logic
#>   <dbl> <dbl>  <int> <chr>       
#> 1    40    NA     40 a           
#> 2    30    20     30 a           
#> 3    NA    10     10 b
PaulS
  • 21,159
  • 2
  • 9
  • 26
  • 1
    Thanks. I was trying to avoid two different "case_when()" calls. In my real dataset, my case_when has many many more "cases" - if there is a way to do with a single "case_when()" would be much simpler... if its possible – mdb_ftl Dec 20 '21 at 20:03
  • @mdb_ftl: I have updated my solution, which now uses only a single `case_when`. I hope it helps you! – PaulS Dec 20 '21 at 20:16
2

Here is an alternative approach dplyr only:

library(dplyr)

z %>% 
  mutate(result = case_when(
    !is.na(a) ~ a, 
    is.na(a) & !is.na(b) ~ b),
    across(-result, ~case_when(
    !is.na(.) ~ cur_column()), .names = 'new_{col}'),
    result_logic = coalesce(new_a, new_b), .keep="unused")
  a     b result result_logic
  <dbl> <dbl>  <dbl> <chr>       
1    40    NA     40 a           
2    30    20     30 a           
3    NA    10     10 b  
TarJae
  • 72,363
  • 6
  • 19
  • 66
1

You could possibly reverse the two steps above and get the second to 'simply' choose the selected value. This would involve only one case_when call:

library(tidyverse)

z <- tibble(a = c(40, 30, NA), 
            b = c(NA, 20, 10))

z %>% 
  mutate(result_logic = case_when(
    !is.na(a) ~ "a",
    is.na(a) & !is.na(b) ~ "b"
  ),
  result = map2_dbl(row_number(), result_logic, ~ z[[.x, .y]]))

#> # A tibble: 3 x 4
#>       a     b result_logic result
#>   <dbl> <dbl> <chr>         <dbl>
#> 1    40    NA a                40
#> 2    30    20 a                30
#> 3    NA    10 b                10

Created on 2021-12-20 by the reprex package (v2.0.1)

Andy Baxter
  • 5,833
  • 1
  • 8
  • 22
1
library(dplyr, warn.conflicts = FALSE)
z <- tibble(a = c(40, 30, NA), 
       b = c(NA, 20, 10))

z %>% 
  mutate(
    result = do.call(coalesce, across(a:b)),
    result_logic = 
      do.call(coalesce,
        across(a:b, ~ ifelse(is.na(.), NA, cur_column())))
  )
#> # A tibble: 3 × 4
#>       a     b result result_logic
#>   <dbl> <dbl>  <dbl> <chr>       
#> 1    40    NA     40 a           
#> 2    30    20     30 a           
#> 3    NA    10     10 b

Created on 2021-12-20 by the reprex package (v2.0.1)

IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38