filter table for cases which overlap dates with another case R

Question

I have data like this:

Sample 1    04/01/18   01/02/18
Sample 2    01/01/18   05/01/18
Sample 4    01/01/16   01/02/16

My desired output is contains only Sample 1 and Sample 2 since these overalap in time. I can't think of a way to do this?

Please use `dput()` to show your data! – jogo Feb 16 '18 at 14:08 — jogo, Feb 16 '18 at 14:08

score 0 · Answer 1 · answered Jan 12 '18 at 19:29

Here is 1 approach. Merge the data to itself and only keep the rows where the dates overlap and IDs don't match

First create the data

sample = c('Sample 1', 'Sample 2', 'Sample 4')
start_date = c('04/01/18', '01/01/18', '01/01/16')
end_date = c('01/02/18', '05/01/18', '01/02/16')
df = data.frame(sample, start_date, end_date)
df

sample     start_date  end_date
Sample 1   04/01/18    01/02/18
Sample 2   01/01/18    05/01/18
Sample 4   01/01/16    01/02/16

Next, transform the strings to dates, so we will be able to do date comparison

df = transform(df, 
    end_date = as.Date(df$end_date, "%m/%d/%y"),
    start_date = as.Date(df$start_date, "%m/%d/%y")
)

Last step is to merge the data to a copy of itself. To do this, I add a field called 'dummy' to merge on, which creates a cartesian product of the data. To filter out the mismatches, I only keep where sample IDs are the different and dates overlap. Here is a good discussion of the logic. Lastly, I did some cleanup by dropping extra columns and resetting names to the original values

df %>% 
    mutate(dummy=TRUE) %>% 
    left_join(df %>% mutate(dummy=TRUE), by='dummy') %>%
    filter(start_date.x < end_date.y, end_date.x > start_date.y, sample.x!=sample.y) %>%
    select(grep('.x$', names(.)))  %>% 
    setNames(gsub('.x$','',names(.)))

sample     start_date   end_date
Sample 1   2018-04-01   2018-01-02
Sample 2   2018-01-01   2018-05-01

If you are doing this with data where there are multiple overlaps per record, you would also need to deduplicate the rows

filter table for cases which overlap dates with another case R

1 Answers1