I'm trying to merge or collapse rows which are partial duplicates based on ID and with varying conditions based on the column. I'd like to end up with a single row per ID in a new dataframe.
This is what my data looks like.
ID Var1 Date1 Var2 Date2
1 1 A 2021-05-01 X 2021-05-01
2 1 B 2021-06-01 Y 2021-04-01
3 2 B 2021-03-01 X <NA>
4 2 <NA> <NA> X 2021-03-01
5 3 A <NA> Y 2021-04-01
6 3 <NA> 2021-02-01 <NA> 2021-05-01
7 4 <NA> 2021-04-01 X <NA>
8 4 <NA> 2021-03-01 Z <NA>
9 5 A 2021-02-01 X 2021-03-01
10 5 B 2021-02-01 Y 2021-04-01
I want to group by ID and take the earliest date for each date variable. Then, I want to extract the corresponding values for Var1 based on Date1 and Var2 based on Date 2.
ID Var1 Date1 Var2 Date2
1 1 A 2021-05-01 Y 2021-04-01
2 2 B 2021-03-01 X 2021-03-01
3 3 <NA> 2021-02-01 Y 2021-04-01
4 4 <NA> 2021-03-01 <NA> <NA>
5 5 A 2021-02-01 X 2021-03-01
6 5 B 2021-02-01 X 2021-03-01
Here's the sample data.
df <- data.frame (ID = c(1,1,2,2,3,3,4,4,5,5), Var1 = c("A", "B", "B", NA, "A", NA, NA, NA, "A", "B"), Date1 = c("2021-05-01", "2021-06-01", "2021-03-01", NA, NA, "2021-02-01", "2021-04-01", "2021-03-01", "2021-02-01", "2021-02-01"), Var2 = c("X", "Y", "X", "X", "Y", NA, "X", "Z", "X", "Y"), Date2 = c("2021-05-01", "2021-04-01", NA, "2021-03-01", "2021-04-01","2021-05-01", NA, NA, "2021-03-01", "2021-04-01"))
and converting to date variables.
df.test1$P_date <- as.Date(df.test1$P_date, format = "%Y-%m-%d")
df.test1$R_date <- as.Date(df.test1$R_date, format = "%Y-%m-%d")
For ID no.5 I'd like to keep both rows if there if the dates match but the Var1 or Var2 don't match but if this is too hard that's ok.
I got as far as getting the earliest date for each date variable using
df <- df %>% group_by(ID) %>% mutate(earliest_date1 = last(na.omit(Date1, order_by = Date1)
df <- df %>% group_by(ID) %>% mutate(earliest_date2 = last(na.omit(Date2, order_by = Date2)
But I haven't been able to get get the corresponding values from Var1 and Var 2 into a single row in a new dataframe. Tried various things but don't think I'm on the right track at all so any help would be greatly appreciated!