I have two dfs: one with all patients (df1) and one with a subset of those patients who are getting additional treatment (df2).
I want to join the two dfs, keeping all rows in df1 and adding a new column for “0/1” if the patient is not getting the additional treatment (0) or is getting the additional treatment (1). There shouldn’t be any issues of patients in df2 who are not in df1, but if there are, I want to catch them!
It's important I keep all of the rows in df1 as I want to compare results between df1 and df2
Simplified example:
df1
id date
0001 2/2/22
0002 1/1/22
0002 3/2/22
0003 3/2/22
0003 5/2/22
0004 6/7/22
0004 9/2/22
df2
id
0002
0004
Desired output:
id date tx
0001 2/2/22 0
0002 1/1/22 1
0002 3/2/22 1
0003 3/2/22 0
0003 5/2/22 0
0004 6/7/22 1
0004 9/2/22 1
I know that I start with a join based on ID, but not sure the most elegant way to create the new column based on absence/presence of data in df2?
df1 %>%
left_join(df2, by = "id")
I found this other question which is trying to do something similar, but I don't see how they joined the two datasets.