-1

I have two different IDs for the same subject(patient). In this other vector of IDs, the two IDs are both in there that indicate the same patient. How do I only count the patient once(by ID1), instead of two different patients with different IDs?

ID1 ID2 
 11 12
 13 14
 15 16

vector

11,12,13,13,14,16

I want to count only the unique patients by ID1, such that I would get

x=11,13,15

Thank you!

3 Answers3

1

Create a unique ID number for each patient, get the data in long format so both the ID's are in same column, join it with the vector select vector values for distinct ID values.

library(dplyr)

df %>%
  mutate(ID = row_number()) %>%
  tidyr::pivot_longer(cols = c(ID1, ID2)) %>%
  inner_join(tibble::enframe(vector), by = 'value') %>%
  distinct(ID, .keep_all = TRUE) %>%
  select(value)

#  value
#  <dbl>
#1    11
#2    13
#3    16

data

df <- structure(list(ID1 = c(11L, 13L, 15L), ID2 = c(12L, 14L, 16L)), 
class = "data.frame", row.names = c(NA, -3L))
vector <- c(11, 12, 13, 13, 14, 16)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • I think the OP wants results from `ID1` only. Please see slight modification in your code, at the end of my answer. – AnilGoyal Dec 17 '20 at 08:13
1

I think probably you need this

df %>% filter((ID1 %in% vector) | (ID2 %in% vector)) %>%
   select(ID1)

  ID1
1  11
2  13
3  15

Check it on a better sample

df <- structure(list(ID1 = c(11L, 13L, 15L, 17L, 19L, 21L), ID2 = c(12L, 
14L, 16L, 18L, 20L, 22L)), class = "data.frame", row.names = c(NA, 
-6L)

> df
  ID1 ID2
1  11  12
2  13  14
3  15  16
4  17  18
5  19  20
6  21  22


vector <- c(11, 12, 13, 13, 14, 16, 18, 18)

> df %>% filter((ID1 %in% vector) | (ID2 %in% vector)) %>% select(ID1)
  
   ID1
1  11
2  13
3  15
4  17

By slightly modifying Ronak's code, you can get same results

df %>%
  mutate(ID = row_number()) %>%
  tidyr::pivot_longer(cols = c(ID1, ID2)) %>%
  inner_join(tibble::enframe(vector), by = 'value') %>%
  distinct(ID, .keep_all = T) %>%
  select(ID, value) %>%
  inner_join(df %>% mutate(ID = row_number()), by = 'ID') %>%
  select(ID1)
AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
1

You can use any with %in% by selecting the rows with apply to subset ID1.

ID$ID1[apply(ID, 1, function(z) any(v %in% z))]
#[1] 11 13 15

or use rowSums.

ID$ID1[rowSums(sapply(ID, "%in%", v)) > 0]
#[1] 11 13 15

Data:

ID <- read.table(header=TRUE, text="ID1 ID2 
 11 12
 13 14
 15 16")
v <- c(11,12,13,13,14,16)
GKi
  • 37,245
  • 2
  • 26
  • 48