How to merge datasets by common values for variable

Question

I want to merge 6 datasets that have an ID variable. I want to have one data set with ID values common to all datasets.

I know this is an easy fix, but I haven;t come across a help topic

ex.

id month sbp dpb
D1  3     40  40 
D1  4     10  10
D1  3     20  20
D2  4     30  20
D3  5     10  40
D1  3     40  40

id month sbp dpb
D1  3     40  40 
D1  4     10  10
D2  3     20  20
D4  4     30  20
D3  5     10  40
D1  3     40  40

final

id month sbp dpb
D1  3     40  40 
D1  4     10  10
D1  3     20  20
D2  4     30  20
D3  5     10  40
D1  3     40  40
D1  3     40  40 
D1  4     10  10
D2  3     20  20
D3  5     10  40
D1  3     40  40

D4 is omitted from final dataset

I don't just want to bind the rows. I want to merge them and eliminate rows with IDs that are not common to all 6 datasets — user9459213, Mar 08 '18 at 02:11

akrun · Accepted Answer · 2018-03-08T02:20:58.957

As we have 6 datasets (assuming that the objects are 'df1', 'df2',... 'df6'), get the values of them in a list with mget, then bind them together (bind_rows) and filter out the 'id's that are not common in all of them

library(dplyr)
n <- 2 #Based on the example only two objects, change it to 6
mget(paste0("df", seq_len(n))) %>%
          bind_rows(., .id = 'grp') %>% 
          group_by(id) %>% 
          filter(n_distinct(grp)==n) %>%
          ungroup %>%
          select(-grp)
# A tibble: 11 x 4
#   id    month   sbp   dpb
#   <chr> <int> <int> <int>
# 1 D1        3    40    40
# 2 D1        4    10    10
# 3 D1        3    20    20
# 4 D2        4    30    20
# 5 D3        5    10    40
# 6 D1        3    40    40
# 7 D1        3    40    40
# 8 D1        4    10    10
# 9 D2        3    20    20
#10 D3        5    10    40
#11 D1        3    40    40

A base R option would be to get the 'id's that are common in all of the datasets with intersect

lst <- setNames(mget(paste0("df", seq_len(n))), NULL)
ids <- Reduce(intersect, lapply(lst, `[[`, 'id'))    
res <- do.call(rbind, lapply(lst, subset, subset = id %in% ids))
row.names(res) <- NULL
res
#   id month sbp dpb
#1  D1     3  40  40
#2  D1     4  10  10
#3  D1     3  20  20
#4  D2     4  30  20
#5  D3     5  10  40
#6  D1     3  40  40
#7  D1     3  40  40
#8  D1     4  10  10
#9  D2     3  20  20
#10 D3     5  10  40
#11 D1     3  40  40

M_M · Answer 2 · 2018-03-08T05:34:12.050

0

Is that what you are looking for? See code below:

df3 <- subset(df2, df2$id %in% df1$id)      
df <- rbind(df2, df3)

edited Mar 08 '18 at 05:34

answered Mar 08 '18 at 02:17

M_M

899
8
21

How to merge datasets by common values for variable

2 Answers2