0

I have a df where I need to calculate the number of days between two types of events among a long list of possible events, say an overdose 'A_type_event' and a clinical session 'C_type_event' for a given unique id where event A always precedes event C:

df<- tribble(
  ~ unique_id, ~event_type, ~ event_date,
  'id_101', 'A_type_event', '2022-01-01',
  'id_101', 'B_type_event', '2022-02-01',
  'id_101', 'A_type_event', '2022-02-15',
  'id_101', 'A_type_event', '2022-02-28',
  'id_101', 'B_type_event', '2022-03-01',
  'id_101', 'C_type_event', '2022-03-10',
  'id_101', 'A_type_event', '2022-03-20',
  'id_101', 'C_type_event', '2022-04-01'
)

The actual df has 20+ event types and I need to filter based on a string (e.g., contains 'A_type'). How can I calculate the number of days between each 'A_type_event' and the next 'C_type_event' grouped by a unique id?

Desired output would be:

df2<- tribble(
  ~ unique_id, ~event_type_A, ~ event_date_A, ~event_type_C, ~event_date_C, ~days_between
  'id_101', 'A_type_event', '2022-01-01','C_type_event', '2022-03-10',68 

)
  • So what's the desired output for this sample input? It looks like AAA_start appears three times and CCC_last appears 2 times. How many rows do you expect to be in the output? – MrFlick Aug 25 '22 at 18:06

1 Answers1

1

Using the dev version of dplyr (currently v1.0.99.9000), we can do a non-equi join to get the first match in the future:

Edit for new data and specifications

#devtools::install_github("tidyverse/dplyr")
library(dplyr)
left_join(
  df %>% filter(event_type == "A_type_event"),  # match A_type_event
  df %>% filter(event_type == "C_type_event"),   # with C_type_event
  #join_by(event_date < event_date),          # where A_type_event before C_type_event
  join_by(unique_id, event_date < event_date), # ... and unique id matches
  multiple = "first")                        # and just keep first match



# A tibble: 4 × 5
  unique_id event_type.x event_date.x event_type.y event_date.y
  <chr>     <chr>        <chr>        <chr>        <chr>       
1 id_101    A_type_event 2022-01-01   C_type_event 2022-03-10  
2 id_101    A_type_event 2022-02-15   C_type_event 2022-03-10  
3 id_101    A_type_event 2022-02-28   C_type_event 2022-03-10  
4 id_101    A_type_event 2022-03-20   C_type_event 2022-04-01  
Jon Spring
  • 55,165
  • 4
  • 35
  • 53
  • Thank-you! This works well. Where could I add a group_by statement to compare the dates for unique_ids? – Wes Furlong Aug 26 '22 at 11:58
  • I updated the sample data to give a better representation of the df. – Wes Furlong Aug 26 '22 at 12:21
  • See edit, you could make your join require a match on id as well. – Jon Spring Aug 26 '22 at 16:15
  • The code was working well but now triggers an error: by must be a (named) character vector, list, or NULL for natural joins (not recommended in production code), not a object. Any idea what might have changed or how to fix it? – Wes Furlong Oct 20 '22 at 14:07