0

I have two dataframes structured like this:

X A  B  C SUM
E 1  0  1  2
F 0  0  1  1
G 1  1  0  2

and this:

X A  B  C SUM
E 1  0  1  2
F 0  0  1  1
G 1  1  0  2
H 0  0  1  1
I 0  0  0  0

The result that i want to obtain is:

 X A  B  C   
 H 0  0  1 

So, i want a code which is able to create another dataframe made by only those rows which are not present in both dataframes. Moreover, the sum of these rows has to be more than zero.

Could someone help me? Thank You!

Silvia
  • 405
  • 4
  • 17

3 Answers3

0

Here's one solution to your question. Let the two data-sets be mydata1 and mydata2

require(dplyr)
rbind(anti_join(mydata1 %>% filter(SUM > 0), mydata2 %>% filter(SUM > 0), by = colnames(mydata1)), 
      anti_join(mydata2 %>% filter(SUM > 0), mydata1 %>% filter(SUM > 0), by = colnames(mydata1)))

Based on the comment, there's one thing you can do is make sure the columns are same.

require(dplyr)
common_columns <- intersect(colnames(mydata1), colnames(mydata2))
rbind(anti_join(mydata1 %>% filter(SUM > 0), mydata2 %>% filter(SUM > 0), by = common_columns), 
      anti_join(mydata2 %>% filter(SUM > 0), mydata1 %>% filter(SUM > 0), by = common_columns))
A Gore
  • 1,870
  • 2
  • 15
  • 26
  • Yeah, it works perfectly. The problem is that the two dataframe haven't always got the same columns. So i would like to extract the different rows and columns too. How could i do, simply modifying that code? – Silvia May 30 '17 at 16:16
  • @Silvia Can you provide an example? – A Gore May 30 '17 at 16:20
0
require(data.table)
dat1 <- data.table(X = c("E","F","G"), A = c(1,0,1), B = c(0,0,1), C = c(1,1,0), SUM = c(2,1,2))
dat2 <- data.table(X = c("E","F","G","H","I"), A = c(1,0,1,0,0),  B = c(0,0,1,0,0), C = c(1,1,0,1,0),
               SUM = c(2,1,2,1,0))

dat3 <- rbind(dat1[,!(names(dat1) %in% "SUM"), with = FALSE], dat2[,!(names(dat2) %in% "SUM"), with = FALSE])

dat3[duplicated(dat3)==FALSE & duplicated(dat3, fromLast = TRUE)==FALSE & 
   rowSums(dat3[,!(names(dat3) %in% "X"), with = FALSE])>0]
simone
  • 577
  • 1
  • 7
  • 15
0
library(data.table)
dat1 <- data.table(X = c("E","F","G"), A = c(1,0,1), B = c(0,0,1), C = c(1,1,0), SUM = c(2,1,2))
dat2 <- data.table(X = c("E","F","G","H","I"), A = c(1,0,1,0,0),  B = c(0,0,1,0,0), C = c(1,1,0,1,0),
                   SUM = c(2,1,2,1,0))


D1=dat1[!dat1$X%in%dat2$X,]
D2=dat2[!dat2$X%in%dat1$X,]
DF=rbind(D1,D2)
DF=DF[DF$SUM>0,]
DF$SUM=NULL
BENY
  • 317,841
  • 20
  • 164
  • 234