How to recode a new date variable and select the lowest date out of four date columns in R

Question

Sample data

stack_dat <- structure(list(bio_drug_stop_date = structure(c(15376, NA, 15602, NA, 15550, NA, 15350, 15363, 15418, 16157), class = "Date"), 
             follow_up_2_years = structure(c(16047, 14318, 16038, 14352, 16044, 16582, 16054, 16048, 16054, 16054), class = "Date"), 
             date_of_last_visit = structure(c(17836, 16405, 17591, 16801, 17866, 15826, 17866, 17257, 18109, 16587), class = "Date"),
             end_of_follow_up_date = structure(c(NA, 17928, NA, 17928, 17900, 16980, 16890, 17100, NA, NA), class = "Date"), data_cut_date = structure(c(18201,
             18201, 18201, 18201, 18201, 18201, 18201, 18201, 18201, 18201), class = "Date")), row.names = c(NA, 10L), class = "data.frame")

Structure

'data.frame':   10 obs. of  5 variables:
 $ bio_drug_stop_date   : Date, format: "2012-02-06" NA "2012-09-19" NA ...
 $ follow_up_2_years    : Date, format: "2013-12-08" "2009-03-15" "2013-11-29" "2009-04-18" ...
 $ date_of_last_visit   : Date, format: "2018-11-01" "2014-12-01" "2018-03-01" "2016-01-01" ...
 $ end_of_follow_up_date: Date, format: NA "2019-02-01" NA "2019-02-01" ...
 $ data_cut_date        : Date, format: "2019-11-01" "2019-11-01" "2019-11-01" "2019-11-01" ...

Aim

The goal is to recode a new variable named treatment_end calculated by taking the date of bio_drug_stop_date; if it is not present, then the lowest date in any of the four other columns: follow_up_2_years, date_of_last_visit, end_of_follow_up_date, data_cut_date

score 1 · Accepted Answer · answered Mar 15 '22 at 15:43

We may use pmin with coalesce - coalesce the 'bio_drug_stop_date' with the min (using pmin) dates from other column for each row

library(dplyr)
library(purrr)
stack_dat %>% 
 mutate(treatment_end = coalesce(bio_drug_stop_date, 
         invoke(pmin, across(-1), na.rm = TRUE)))

-output

 bio_drug_stop_date follow_up_2_years date_of_last_visit end_of_follow_up_date data_cut_date treatment_end
1          2012-02-06        2013-12-08         2018-11-01                  <NA>    2019-11-01    2012-02-06
2                <NA>        2009-03-15         2014-12-01            2019-02-01    2019-11-01    2009-03-15
3          2012-09-19        2013-11-29         2018-03-01                  <NA>    2019-11-01    2012-09-19
4                <NA>        2009-04-18         2016-01-01            2019-02-01    2019-11-01    2009-04-18
5          2012-07-29        2013-12-05         2018-12-01            2019-01-04    2019-11-01    2012-07-29
6                <NA>        2015-05-27         2013-05-01            2016-06-28    2019-11-01    2013-05-01
7          2012-01-11        2013-12-15         2018-12-01            2016-03-30    2019-11-01    2012-01-11
8          2012-01-24        2013-12-09         2017-04-01            2016-10-26    2019-11-01    2012-01-24
9          2012-03-19        2013-12-15         2019-08-01                  <NA>    2019-11-01    2012-03-19
10         2014-03-28        2013-12-15         2015-06-01                  <NA>    2019-11-01    2014-03-28

The solution works perfectly. Could you please explain in laypeople's terms what is happening here? — Pashtun, Mar 15 '22 at 18:15
@CoinAtlas `pmin` gets the minimum value rowwise from the columns other than the first (`across(-1)`). `invoke` is the equivalent of `do.call` from `base R`, then we use the output of `pmin` to `coalesce` with the first column. coalesce only replaces whereever there are `NA` values in the first argument (i.e. `bio_drug_stop_date`) with the `pmin` output — akrun, Mar 15 '22 at 18:20
It is similar to doing `stack_dat %>% mutate(treatment_end = coalese(bio_drug_stop_date, pmin(follow_up_2_years, date_of_last_visit_end, follow_up_date, data_cut_date, na.rm = TRUE)))` — akrun, Mar 15 '22 at 18:21

How to recode a new date variable and select the lowest date out of four date columns in R

1 Answers1