Subset duplicate values with two different status values

Question

I have Id Values some are duplicates, there is another column Status (yes/No) , I want to extract the subset of duplicate values which have two different status values. For example , if my dataset is as below

   Id        Status
   23984     Yes
   94949     No
   89685     No
   33232     Yes
   33232     Yes
   57943     No
   75633     Yes
   75633     No

In this case, Id 75633 is duplicate and has two different status (Yes & No) so my final dataset should contain only such ids and not 33232.

   Id        Status
   75633     Yes
   75633     No

I know how to do a subset but no clue how to include this criteria need some help.

Ryan C. Thompson · Answer 1 · 2015-05-11T18:45:18.390

2

If you remove all duplicate rows, than any ID that occurs more than once must have multiple statuses associated with it. So if your data frame is df:

tb <- table(unique(df)$Id)
names(tb[tb>1])

Or in one pipeline with magrittr:

library(magrittr)
df %>% unique %$% Id %>% table %>% subset(. > 1) %>% names

Then you can subset the data frame to select only those IDs.

edited May 11 '15 at 18:45

answered May 08 '15 at 00:28

Ryan C. Thompson

40,856
28
97
159

score 2 · Accepted Answer · answered May 08 '15 at 02:01

2

library(dplyr)
df %>% 
    group_by(Id) %>% 
    filter("Yes"%in% Status & "No"  %in% Status)

answered May 08 '15 at 02:01

ExperimenteR

4,453
1
15
19

score 1 · Answer 3 · answered May 08 '15 at 00:35

1

What about this using tapply?

ind <- which(tapply(df$Status, df$Id, function(x) "Yes" %in% x & "No" %in% x))
df[df$Id == names(ind),]
     Id Status
7 75633    Yes
8 75633     No

answered May 08 '15 at 00:35

DatamineR

10,428
3
25
45

Subset duplicate values with two different status values

3 Answers3