-1

I have the following data.frame "A" in R:

ID
1
456
200
550
110

and the following data.frame "B" defined in R

ID Interaction Student Date
1 email_sent Isabel Gauss 10/20/2021
200 email_sent Jason Bin 2/8/2021
550 email_sent Brad Pit 10/20/2021
999 email_sent John Brad 10/15/2023
255 email_sent Joe Sam 10/20/2021

and the following data.frame "C" defined in R

ID Interaction Student Date
1 Event_attend Isabel Gauss 09/14/2021
200 Event_attend Jason Bin 6/9/2021
550 Event_attend Brad Pit 10/20/2023
999 Event_attend John Brad 10/10/2022
255 Event_attend Joe Sam 10/15/2021

I need to have the following output ordered by ID:

ID Event_sent Date_email Email_attend Date_attended
1 Y 10/20/2021 Y 09/14/2021
456 N Null N Null
200 Y 2/8/2021 Y 6/9/2021
550 Y 10/20/2021 Y 10/20/2023
110 N Null N Null

The following is not working properly:

merge <- merge(A, B, C, by.x = "ID", all.x = TRUE, sort = TRUE)
jsaab
  • 15
  • 5
  • 1
    It's more helpful to provide code that makes the dataframes (e.g. by running `dput(A)` and pasting the result into your question), instead of showing what the dataframe prints out as, which is ambiguous as to data types and structure (e.g. grouping, nesting, etc.) – Jon Spring May 23 '23 at 19:35
  • you mean run dput(x, filename, etc..) in my R code ? – jsaab May 23 '23 at 19:41
  • 2
    If you have object `A`, you can run `dput(A)` and the output will be a code recipe to recreate the object. `dput(head(A))` would recreate the first 6 rows. `dput(A[1:5, 1:8])` would recreate the first 5 rows and the first 8 columns. – Jon Spring May 23 '23 at 19:43
  • I would do something like `library(dplyr); bind_rows(left_join(A,B), left_join(A,C)) %>% tidyr::pivot(names_from = Interaction, values_from = Date)` with another step to add the Y/N. – Jon Spring May 23 '23 at 20:06

1 Answers1

0

One approach, using a pipeline of {dplyr} and {tidyr} manipulations:

library(dplyr)
library(tidyr) ## for pivoting

A |>
  full_join(
    bind_rows(B, C) |>
    pivot_wider(names_from = Interaction, values_from = Date) |>
    rename(Date_email = email_sent,
           Date_attended = Event_attend
           )
  ) |>
  mutate(Event_sent = ifelse(is.na(Date_email), 'N', 'Y'),
         Email_attend = ifelse(is.na(Date_attended), 'N', 'Y')
         ) |>
  mutate(across(everything(), ~ ifelse(is.na(.x), 'Null', .x))) |>
  select(ID, Event_sent, Date_email, Email_attend, Date_attended) |>
  arrange(ID)
   ID Event_sent Date_email Email_attend Date_attended
1   1          Y 10/20/2021            Y    09/14/2021
2 110          N       Null            N          Null
3 200          Y   2/8/2021            Y      6/9/2021
4 255          Y 10/20/2021            Y    10/15/2021
5 456          N       Null            N          Null
6 550          Y 10/20/2021            Y    10/20/2023
7 999          Y 10/15/2023            Y    10/10/2022

(To learn which does which in the pipeline you can start with a large part of the downstream instructions commented out and inspect the intermediate results, then stepwise uncomment the rest of the pipe)

I_O
  • 4,983
  • 2
  • 2
  • 15