1

I have a dataset that has some ID and associated timepoints. I want to filter out IDs that have a specific combination of timepoints. If I filter using %in% or |, I get IDs out of the specific combination. How do I do this in R ?

ID Timepoint
1 1
1 6
1 12
2 1
3 1
3 6
3 12
3 18
4 1
4 6
4 12

I want to filter IDs that have timepoints 1,6 and 12 and exclude other IDs.

Result would be IDs 1,3 and 4

Datamaniac
  • 171
  • 2
  • 9

3 Answers3

1
library(dplyr)

df <- data.frame(ID = c(1, 1, 1, 2, 3, 3, 3, 3, 4, 4, 4),
                 Timepoint = c(1, 6, 12, 1, 1, 6, 12, 18, 1, 6, 12))

df %>%
  filter(Timepoint %in% c(1, 6, 12)) %>%
  mutate(indicator = 1) %>%
  group_by(ID) %>%
  complete(Timepoint = c(1, 6, 12)) %>%
  filter(!ID %in% pull(filter(., is.na(indicator)), ID)) %>%
  select(indicator)

Output:

# A tibble: 9 × 2
# Groups:   ID [3]
     ID indicator
  <dbl>     <dbl>
1     1         1
2     1         1
3     1         1
4     3         1
5     3         1
6     3         1
7     4         1
8     4         1
9     4         1
Brigadeiro
  • 2,649
  • 13
  • 30
  • Hi, I want IDs which have all three of the timepoints (1,6 and 12). In this case the result should be only 1,3 and 4 as ID 2 has only Timepoint1. – Datamaniac Aug 26 '21 at 15:34
1

We can use

library(dplyr)
df %>% 
    group_by(ID) %>% 
    filter(all(c(1, 6, 12) %in% Timepoint)) %>% 
    ungroup

-output

# A tibble: 10 x 2
      ID Timepoint
   <dbl>     <dbl>
 1     1         1
 2     1         6
 3     1        12
 4     3         1
 5     3         6
 6     3        12
 7     3        18
 8     4         1
 9     4         6
10     4        12
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Hi, I want IDs which have all three of the timepoints (1,6 and 12). In this case the result should be only 1,3 and 4 as ID 2 has only Timepoint1. – Datamaniac Aug 26 '21 at 17:03
-1

From your data, ID 2 has time point 1. So if filter by time points 1, 6, 12, the result will be 1, 2, 3, 4 instead of 1, 3, 4.

ids <- c(1, 1, 1, 2, 3, 3, 3, 3, 4, 4, 4)
time_points <- c(1, 6, 12, 1, 1, 6, 12, 18, 1, 6, 12)
dat <- data.frame(ids, time_points)
unique(dat$ids[dat$time_points %in% c(1, 6, 12)])
ccommjin
  • 87
  • 5