Combine more than 3 data.frames while keeping the extra-rows from the master file that didn't match

Question

I have the following data.frame "A" in R:

ID
1
456
200
550
110

and the following data.frame "B" defined in R

ID	Interaction	Student	Date
1	email_sent	Isabel Gauss	10/20/2021
200	email_sent	Jason Bin	2/8/2021
550	email_sent	Brad Pit	10/20/2021
999	email_sent	John Brad	10/15/2023
255	email_sent	Joe Sam	10/20/2021

and the following data.frame "C" defined in R

ID	Interaction	Student	Date
1	Event_attend	Isabel Gauss	09/14/2021
200	Event_attend	Jason Bin	6/9/2021
550	Event_attend	Brad Pit	10/20/2023
999	Event_attend	John Brad	10/10/2022
255	Event_attend	Joe Sam	10/15/2021

I need to have the following output ordered by ID:

ID	Event_sent	Date_email	Email_attend	Date_attended
1	Y	10/20/2021	Y	09/14/2021
456	N	Null	N	Null
200	Y	2/8/2021	Y	6/9/2021
550	Y	10/20/2021	Y	10/20/2023
110	N	Null	N	Null

The following is not working properly:

merge <- merge(A, B, C, by.x = "ID", all.x = TRUE, sort = TRUE)

It's more helpful to provide code that makes the dataframes (e.g. by running `dput(A)` and pasting the result into your question), instead of showing what the dataframe prints out as, which is ambiguous as to data types and structure (e.g. grouping, nesting, etc.) — Jon Spring, May 23 '23 at 19:35
If you have object `A`, you can run `dput(A)` and the output will be a code recipe to recreate the object. `dput(head(A))` would recreate the first 6 rows. `dput(A[1:5, 1:8])` would recreate the first 5 rows and the first 8 columns. — Jon Spring, May 23 '23 at 19:43
I would do something like `library(dplyr); bind_rows(left_join(A,B), left_join(A,C)) %>% tidyr::pivot(names_from = Interaction, values_from = Date)` with another step to add the Y/N. — Jon Spring, May 23 '23 at 20:06

score 0 · Answer 1 · answered May 23 '23 at 20:09

One approach, using a pipeline of {dplyr} and {tidyr} manipulations:

library(dplyr)
library(tidyr) ## for pivoting

A |>
  full_join(
    bind_rows(B, C) |>
    pivot_wider(names_from = Interaction, values_from = Date) |>
    rename(Date_email = email_sent,
           Date_attended = Event_attend
           )
  ) |>
  mutate(Event_sent = ifelse(is.na(Date_email), 'N', 'Y'),
         Email_attend = ifelse(is.na(Date_attended), 'N', 'Y')
         ) |>
  mutate(across(everything(), ~ ifelse(is.na(.x), 'Null', .x))) |>
  select(ID, Event_sent, Date_email, Email_attend, Date_attended) |>
  arrange(ID)

   ID Event_sent Date_email Email_attend Date_attended
1   1          Y 10/20/2021            Y    09/14/2021
2 110          N       Null            N          Null
3 200          Y   2/8/2021            Y      6/9/2021
4 255          Y 10/20/2021            Y    10/15/2021
5 456          N       Null            N          Null
6 550          Y 10/20/2021            Y    10/20/2023
7 999          Y 10/15/2023            Y    10/10/2022

(To learn which does which in the pipeline you can start with a large part of the downstream instructions commented out and inspect the intermediate results, then stepwise uncomment the rest of the pipe)

Combine more than 3 data.frames while keeping the extra-rows from the master file that didn't match

1 Answers1