2

I'm trying to find whether a date exists between multiple pairs of dates which are wide in my dataset - the length I've given here is just an example, the eventual number may be larger or smaller. Not sure if this is the most sensible option but working longwise didn't seem to work, this is also a very common way to work with overlapping dates and date pairs in SPSS, where you can have multiple variables numerised as the dates are here and it works through each numbered 'set' to give you a response.

Here is an example dataset:

  person   key_date 1_end_date 2_end_date 3_end_date 4_end_date 1_start_date 2_start_date 3_start_date 4_start_date
1      1 2019-09-30 2019-05-23 2019-09-30 2016-07-22       <NA>   2019-05-23   2019-09-30   2016-07-22         <NA>
2      2 2019-06-07 2019-05-16 2019-06-07       <NA>       <NA>   2019-05-16         <NA>         <NA>         <NA>
3      3 2020-03-09 2016-06-02 2019-08-09 2020-05-27 2020-02-12   2016-06-02   2019-08-09   2020-05-27   2020-03-09

test <- structure(list(person = 1:3, key_date = structure(c(18169, 18054,18330), class = "Date"), `1_end_date` = structure(c(18039, 18032,16954), class = "Date"), `2_end_date` = structure(c(18169, 18054,18117), class = "Date"), `3_end_date` = structure(c(17004, NA,18409), class = "Date"), `4_end_date` = structure(c(NA, NA, 18304), class = "Date"), `1_start_date` = structure(c(18039, 18032,16954), class = "Date"), `2_start_date` = structure(c(18169,NA, 18117), class = "Date"), `3_start_date` = structure(c(17004,NA, 18409), class = "Date"), `4_start_date` = structure(c(NA,NA, 18330), class = "Date")), row.names = c(NA, 3L), class = "data.frame") 

The expected output would be just a binary flag to indicate that the key_date exists between any pair of start_date and end_date. In the example given, that would mean person 1 and 3. Any ideas how to do this? Is this really inefficient?

TimL
  • 211
  • 2
  • 11

1 Answers1

3

tidyverse approach

library(tidyverse)
result <- test %>% mutate(across(ends_with("end_date"), ~ 
                         key_date <= . & key_date >= get(str_replace(cur_column(), "end", "start")),
                       .names = '{.col}_flag')) %>%
  rowwise() %>%
  mutate(Flag1 = sum(c_across(ends_with("flag")), na.rm = T)) %>%
  ungroup() %>%
  select(-ends_with("flag"))

> result$Flag1
[1] 1 0 0

Complete output will look like

> result
# A tibble: 3 x 11
  person key_date   `1_end_date` `2_end_date` `3_end_date` `4_end_date` `1_start_date` `2_start_date` `3_start_date` `4_start_date` Flag1
   <int> <date>     <date>       <date>       <date>       <date>       <date>         <date>         <date>         <date>         <dbl>
1      1 2019-09-30 2019-05-23   2019-09-30   2016-07-22   NA           2019-05-23     2019-09-30     2016-07-22     NA                 1
2      2 2019-06-07 2019-05-16   2019-06-07   NA           NA           2019-05-16     NA             NA             NA                 0
3      3 2020-03-09 2016-06-02   2019-08-09   2020-05-27   2020-02-12   2016-06-02     2019-08-09     2020-05-27     2020-03-09         0
AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
  • this is a great solution and works much faster than what I previously had. I changed it slightly so it was housed in an `~as.numeric` rather than an `~ifelse`. I also had other non-date columns to complicate this in actuality but adapting this `get(str_replace(cur_column(), "end", "start")` is really useful. Thanks! – TimL Apr 22 '21 at 12:01
  • Yes, as.numeric was a better approach. It depends on what one is used to of. Nevertheless glad to have been of help. – AnilGoyal Apr 22 '21 at 12:04
  • even `as.numeric` is not necessary. It will convert flag to more than 1 when keydate lies in two groups (if so possible) – AnilGoyal Apr 22 '21 at 12:37
  • Ah interesting. I wish in a normal `mutate` command this functionality worked: that you could create a logical argument and it would create a boolean, rather than having to wrap it in `as.numeric` – TimL Apr 22 '21 at 14:37
  • It works! Actually wrapping boolean in a sum in last step converted it to numeric. – AnilGoyal Apr 22 '21 at 15:04