2

I have Id Values some are duplicates, there is another column Status (yes/No) , I want to extract the subset of duplicate values which have two different status values. For example , if my dataset is as below

   Id        Status
   23984     Yes
   94949     No
   89685     No
   33232     Yes
   33232     Yes
   57943     No
   75633     Yes
   75633     No

In this case, Id 75633 is duplicate and has two different status (Yes & No) so my final dataset should contain only such ids and not 33232.

   Id        Status
   75633     Yes
   75633     No

I know how to do a subset but no clue how to include this criteria need some help.

3 Answers3

2

If you remove all duplicate rows, than any ID that occurs more than once must have multiple statuses associated with it. So if your data frame is df:

tb <- table(unique(df)$Id)
names(tb[tb>1])

Or in one pipeline with magrittr:

library(magrittr)
df %>% unique %$% Id %>% table %>% subset(. > 1) %>% names

Then you can subset the data frame to select only those IDs.

Ryan C. Thompson
  • 40,856
  • 28
  • 97
  • 159
2
library(dplyr)
df %>% 
    group_by(Id) %>% 
    filter("Yes"%in% Status & "No"  %in% Status)
ExperimenteR
  • 4,453
  • 1
  • 15
  • 19
1

What about this using tapply?

ind <- which(tapply(df$Status, df$Id, function(x) "Yes" %in% x & "No" %in% x))
df[df$Id == names(ind),]
     Id Status
7 75633    Yes
8 75633     No
DatamineR
  • 10,428
  • 3
  • 25
  • 45