Compare to df's and populate col based on matching rows

Question

I have two df's that look like this,

id1<-c(2,2,2,1,3)
id2<-c(4,4,5,2,3)
OS<-c(1,2,3,4,5)
a <- cbind(OS, id1, id2) 
a

i1<-c(2,2,2,4,3)
i2<-c(4,4,5,2,3)
OSi<-c(1,2,3,4,5)
b <- cbind(OSi, i1, i2)

I want to create a new col (x) with a '0' where a$OS & a$OSi match (i.e. 1->1, 2->2) but only where the subsequent two cols in each df are in agreement, and '1' where they don't. My desired output is something like this:

id1<-c(2,2,2,1,3)
id2<-c(4,4,5,2,3)
OS<-c(1,2,3,4,5)
x <-c(0,0,0,1,0)
a <- cbind(OS, id1, id2, x) 
a

I have so far tried this:

a$x <- case_when(a$OS %in% b$OSi &
                     a$id1 %in% b$I1 & 
                     b$id2 %in% b$I2 ~ '0') %>%
  replace_na('1')

However, I find that Case_when only looks to satisfy the first argument here. (i.e. the ratio of 0/1 change depending on the order of the variables).

Also, the above is a top df. My actual data is ~10000 rows and b is longer than a. I may also want to drop any rows not in a, but in b.

I hope that's clear enough. Dplyr my preference to integrate into pipes!

Cheers!

Karthik S · Answer 1 · 2020-10-12T17:22:44.890

1

Does this work:

> id1<-c(2,2,2,1,3)
> id2<-c(4,4,5,2,3)
> OS<-c(1,2,3,4,5)
> a <- data.frame(OS, id1, id2) 
> 
> 
> i1<-c(2,2,2,4,3)
> i2<-c(4,4,5,2,3)
> OSi<-c(1,2,3,4,5)
> b <- data.frame(OSi, i1, i2) 
> 
> 
> transform(a %>% inner_join(b, by = c('OS' = 'OSi')), x = ifelse((id1 == i1) & (id2 == i2), 0, 1)) %>% select(OS, id1, id2, x)
  OS id1 id2 x
1  1   2   4 0
2  2   2   4 0
3  3   2   5 0
4  4   1   2 1
5  5   3   3 0
>

I used inner_join, so we are joining based on your first condition of a$OS == b$OSi

edited Oct 12 '20 at 17:22

answered Oct 12 '20 at 16:46

Karthik S

11,348
2
11
25

I don't think I can bind the df's because they're different lengths: Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 7625, 8996 – Jamie Dunning Oct 12 '20 at 17:14
OK, assumed so from your sample data. So, do OS and OSi overlap ? Can they be used as primary key and foreign key in that case? If so, which is your primary table, I am assuming it's "a" ? – Karthik S Oct 12 '20 at 17:19
Have changed the code, can you please check if that works? – Karthik S Oct 12 '20 at 17:23
Hi Karthik, thanks for this, but it doesn't quite work. I'm not sure why, but it gives me fewer positive (1) results than I expect. – Jamie Dunning Oct 12 '20 at 17:37
OK, can you please include the parts where you are not getting expected output to your sample data above? With just 5 rows in sample data, it may be difficult to visualize the issue. – Karthik S Oct 12 '20 at 17:43
I guess what I'm asking is how I can write this expression: p$EP <- case_when(p$BroodName %in% broods$Brood & p$sire %in% broods$SocialDadID & p$dam %in% broods$SocialMumID ~ 'N') %>% replace_na('Y') (from my actual data), where the case_when considers the '&', rather than only listening to the first argument. – Jamie Dunning Oct 12 '20 at 20:12
this https://stackoverflow.com/questions/58137828/create-new-variable-by-multiple-conditions-via-mutate-case-when seems to imply the '&' is considered in Case_when? – Jamie Dunning Oct 12 '20 at 20:16

Compare to df's and populate col based on matching rows

1 Answers1