1

I need to create a column based on dates, this is my dataset:

test <- data.frame(PatientID = c("1000285","1000317", "1000399","10006485","10995700317" ), dateMI = c(NA, NA, "2008-10-26", "2008-10-26", NA),dateA = c(NA, NA,"2008-10-26", "2010-11-06", "2019-02-14"), dateCVA = c("2014-02-04", "2001-02-27", NA, NA, "2020-02-14"),  stringsAsFactors = F)
 

I would need to create an extra column called dateEVENT, that contains the most recent date from dateMI, dateA and dateCVA.

So effectively the new dataset would look like the one below:

test <- data.frame(PatientID = c("1000285","1000317", "1000399","10006485","10995700317" ), dateMI = c(NA, NA, "2008-10-26", "2008-10-26", NA),dateA = c(NA, NA,"2008-10-26", "2010-11-06", "2019-02-14"), dateCVA = c("2014-02-04", "2001-02-27", NA, NA, "2020-02-14"), dateEVENT = c("2014-02-04", "2001-02-27", "2008-10-26", "2010-11-06", "2020-02-14"),  stringsAsFactors = F)           

What would be the best way of doing this?

thanks

Lili
  • 547
  • 6
  • 19

1 Answers1

2

An option is to loop over the rows with rowwise and get the max 'date' after converting the date columns to Date class

library(dplyr)
test1 <- test %>% 
      mutate(across(starts_with('date'), as.Date)) %>%
      rowwise %>% 
      mutate(dateEVENT = max(c_across(starts_with('date')), 
        na.rm = TRUE)) %>%
      ungroup

-output

test1
# A tibble: 5 x 5
#  PatientID   dateMI     dateA      dateCVA    dateEVENT 
#  <chr>       <date>     <date>     <date>     <date>    
#1 1000285     NA         NA         2014-02-04 2014-02-04
#2 1000317     NA         NA         2001-02-27 2001-02-27
#3 1000399     2008-10-26 2008-10-26 NA         2008-10-26
#4 10006485    2008-10-26 2010-11-06 NA         2010-11-06
#5 10995700317 NA         2019-02-14 2020-02-14 2020-02-14
akrun
  • 874,273
  • 37
  • 540
  • 662