Extract the discriminating rows of two dataframes in R

Question

I have two dataframes structured like this:

X A  B  C SUM
E 1  0  1  2
F 0  0  1  1
G 1  1  0  2

and this:

X A  B  C SUM
E 1  0  1  2
F 0  0  1  1
G 1  1  0  2
H 0  0  1  1
I 0  0  0  0

The result that i want to obtain is:

 X A  B  C   
 H 0  0  1

So, i want a code which is able to create another dataframe made by only those rows which are not present in both dataframes. Moreover, the sum of these rows has to be more than zero.

Could someone help me? Thank You!

2

Maybe `dplyr::anti_join`? – Mike H. May 29 '17 at 14:41

A Gore · Answer 1 · 2017-05-30T16:23:13.717

0

Here's one solution to your question. Let the two data-sets be mydata1 and mydata2

require(dplyr)
rbind(anti_join(mydata1 %>% filter(SUM > 0), mydata2 %>% filter(SUM > 0), by = colnames(mydata1)), 
      anti_join(mydata2 %>% filter(SUM > 0), mydata1 %>% filter(SUM > 0), by = colnames(mydata1)))

Based on the comment, there's one thing you can do is make sure the columns are same.

require(dplyr)
common_columns <- intersect(colnames(mydata1), colnames(mydata2))
rbind(anti_join(mydata1 %>% filter(SUM > 0), mydata2 %>% filter(SUM > 0), by = common_columns), 
      anti_join(mydata2 %>% filter(SUM > 0), mydata1 %>% filter(SUM > 0), by = common_columns))

edited May 30 '17 at 16:23

answered May 29 '17 at 15:01

A Gore

1,870
2
15
26

Yeah, it works perfectly. The problem is that the two dataframe haven't always got the same columns. So i would like to extract the different rows and columns too. How could i do, simply modifying that code? – Silvia May 30 '17 at 16:16
@Silvia Can you provide an example? – A Gore May 30 '17 at 16:20

score 0 · Answer 2 · answered May 29 '17 at 15:20

require(data.table)
dat1 <- data.table(X = c("E","F","G"), A = c(1,0,1), B = c(0,0,1), C = c(1,1,0), SUM = c(2,1,2))
dat2 <- data.table(X = c("E","F","G","H","I"), A = c(1,0,1,0,0),  B = c(0,0,1,0,0), C = c(1,1,0,1,0),
               SUM = c(2,1,2,1,0))

dat3 <- rbind(dat1[,!(names(dat1) %in% "SUM"), with = FALSE], dat2[,!(names(dat2) %in% "SUM"), with = FALSE])

dat3[duplicated(dat3)==FALSE & duplicated(dat3, fromLast = TRUE)==FALSE & 
   rowSums(dat3[,!(names(dat3) %in% "X"), with = FALSE])>0]

score 0 · Answer 3 · answered May 30 '17 at 01:47

library(data.table)
dat1 <- data.table(X = c("E","F","G"), A = c(1,0,1), B = c(0,0,1), C = c(1,1,0), SUM = c(2,1,2))
dat2 <- data.table(X = c("E","F","G","H","I"), A = c(1,0,1,0,0),  B = c(0,0,1,0,0), C = c(1,1,0,1,0),
                   SUM = c(2,1,2,1,0))


D1=dat1[!dat1$X%in%dat2$X,]
D2=dat2[!dat2$X%in%dat1$X,]
DF=rbind(D1,D2)
DF=DF[DF$SUM>0,]
DF$SUM=NULL

Extract the discriminating rows of two dataframes in R

3 Answers3