How to reconcile two different IDs as one, then apply to a df with both IDs but count the subject only once in R?

Question

I have two different IDs for the same subject(patient). In this other vector of IDs, the two IDs are both in there that indicate the same patient. How do I only count the patient once(by ID1), instead of two different patients with different IDs?

vector

11,12,13,13,14,16

I want to count only the unique patients by ID1, such that I would get

x=11,13,15

Thank you!

Then why do you need `vector` at all? It would be just `df$ID1` ? If that is not what you want maybe provide a better example. — Ronak Shah, Dec 17 '20 at 06:04

score 1 · Answer 1 · answered Dec 17 '20 at 05:14

Create a unique ID number for each patient, get the data in long format so both the ID's are in same column, join it with the vector select vector values for distinct ID values.

library(dplyr)

df %>%
  mutate(ID = row_number()) %>%
  tidyr::pivot_longer(cols = c(ID1, ID2)) %>%
  inner_join(tibble::enframe(vector), by = 'value') %>%
  distinct(ID, .keep_all = TRUE) %>%
  select(value)

#  value
#  <dbl>
#1    11
#2    13
#3    16

data

df <- structure(list(ID1 = c(11L, 13L, 15L), ID2 = c(12L, 14L, 16L)), 
class = "data.frame", row.names = c(NA, -3L))
vector <- c(11, 12, 13, 13, 14, 16)

I think the OP wants results from `ID1` only. Please see slight modification in your code, at the end of my answer. — AnilGoyal, Dec 17 '20 at 08:13

AnilGoyal · Accepted Answer · 2020-12-18T03:12:51.830

I think probably you need this

df %>% filter((ID1 %in% vector) | (ID2 %in% vector)) %>%
   select(ID1)

  ID1
1  11
2  13
3  15

Check it on a better sample

df <- structure(list(ID1 = c(11L, 13L, 15L, 17L, 19L, 21L), ID2 = c(12L, 
14L, 16L, 18L, 20L, 22L)), class = "data.frame", row.names = c(NA, 
-6L)

> df
  ID1 ID2
1  11  12
2  13  14
3  15  16
4  17  18
5  19  20
6  21  22


vector <- c(11, 12, 13, 13, 14, 16, 18, 18)

> df %>% filter((ID1 %in% vector) | (ID2 %in% vector)) %>% select(ID1)
  
   ID1
1  11
2  13
3  15
4  17

By slightly modifying Ronak's code, you can get same results

df %>%
  mutate(ID = row_number()) %>%
  tidyr::pivot_longer(cols = c(ID1, ID2)) %>%
  inner_join(tibble::enframe(vector), by = 'value') %>%
  distinct(ID, .keep_all = T) %>%
  select(ID, value) %>%
  inner_join(df %>% mutate(ID = row_number()), by = 'ID') %>%
  select(ID1)

GKi · Answer 3 · 2020-12-17T14:29:23.047

1

You can use any with %in% by selecting the rows with apply to subset ID1.

ID$ID1[apply(ID, 1, function(z) any(v %in% z))]
#[1] 11 13 15

or use rowSums.

ID$ID1[rowSums(sapply(ID, "%in%", v)) > 0]
#[1] 11 13 15

Data:

ID <- read.table(header=TRUE, text="ID1 ID2 
 11 12
 13 14
 15 16")
v <- c(11,12,13,13,14,16)

edited Dec 17 '20 at 14:29

answered Dec 17 '20 at 09:01

GKi

37,245
2
26
48

How to reconcile two different IDs as one, then apply to a df with both IDs but count the subject only once in R?

3 Answers3