multiple criteria filtering join using dplyr

Question

I'm trying to accomplish the operation described below by creating a df named event_f.

I want from the detail df as filtering criteria, all event_id that have type_id == 6 excluding those with a combination of 6 and 3 or 6 and 7.

Note that there can be other combinations but they are all to be included then.

library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 3.5.3
#> Warning: package 'purrr' was built under R version 3.5.3


event <- tibble(id = c("00_1", "00_2", "00_3", "00_4", "00_5", "00_6", "00_7"),
                type_id = c("A", "B", "C", "B", "A", "B", "C"))


detail <- tibble(id = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L),
                 event_id = c("00_1", "00_1", "00_2", "00_2", "00_3", "00_4", "00_4", "00_5", "00_6", "00_6", "00_7", "00_8"),
                 type_id = c(3L, 4L, 6L, 7L, 2L, 6L, 3L, 2L, 6L, 5L, 2L, 1L))



event_f <- event %>%
  semi_join(detail %>% filter(event_id %in% event$id,
                              type_id == 6,
                              type_id != (7 | 3)), by = c("id" = "event_id"))

^{Created on 2019-04-01 by the reprex package (v0.2.1)}

I would like to have a df with one row : id = "00_6" and type_id = "B". I suppose the problem comes from the last two filter() operations, but not sure how to combine them?

Ronak Shah · Accepted Answer · 2019-04-01T10:28:39.853

2

I think you need

library(dplyr)

event %>%
   semi_join(detail %>%
               group_by(event_id) %>%
               filter(any(type_id == 6) & all(!type_id %in% c(3, 7))),
    by = c("id" = "event_id"))

# id    type_id
# <chr> <chr>  
#1 00_6  B

As we are trying to find out event_ids for those type_id which satisfy the criteria we need to group_by event_id. If we do not group_by then the filtering criteria would be applied to entire dataframe instead which will return 0 rows since we have values 3 and 7 in the dataframe.

edited Apr 01 '19 at 10:28

answered Apr 01 '19 at 09:46

Ronak Shah

377,200
20
156
213

That's excellent, thanks. Would you mind explaining why you need the `group_by` here? – MattnDo Apr 01 '19 at 10:22
1

@MattnDo Updated the answer with some explanation. – Ronak Shah Apr 01 '19 at 10:28

multiple criteria filtering join using dplyr

1 Answers1