2

My data looks like this:

A   B
1   2
1A  2
1A  2
2   3
2   4
2   4
3A  0
3A  0
4A  1
4A  1
5   5

I want to subset the data, and extract all records that are duplicates, based on values on both columns. I tried using cbind, and unique, but they extract only the unique values. I couldnt find a reverse subset function, if that can help. Thx.

Litwos
  • 1,278
  • 4
  • 19
  • 44

1 Answers1

2

You can try

 df1[duplicated(df1)|duplicated(df1, fromLast=TRUE),]
 #    A B
 #2  1A 2
 #3  1A 2
 #5   2 4
 #6   2 4
 #7  3A 0
 #8  3A 0
 #9  4A 1
 #10 4A 1

data

 df1 <- structure(list(A = c("1", "1A", "1A", "2", "2", "2", "3A",
 "3A", 
 "4A", "4A", "5"), B = c(2L, 2L, 2L, 3L, 4L, 4L, 0L, 0L, 1L, 1L, 
 5L)), .Names = c("A", "B"), class = "data.frame", row.names = c(NA, 
 -11L))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • It returns an error: Error in `[.data.frame`(b, duplicated(b) | duplicated(b, fromLast = T)) : undefined columns selected – Litwos Mar 09 '15 at 11:25
  • @Litwos Based on the `dput` output in my post, it is not giving any errors. Please copy/paste the dput output and see if the error persists. – akrun Mar 09 '15 at 11:28
  • It worked, but I transformed the column to factor (as.factor). Is that necessary? I will now try on all my data. – Litwos Mar 09 '15 at 11:31
  • @Litwos It is not necessary. I wouldn't work with factors unless it is needed for a specific purpose, If you look at the `str(df1)`, these are non-factor columns. One problem with factor column is that after you subset you may not to drop the unused levels. ie. `droplevels(df1[duplicated(...)` – akrun Mar 09 '15 at 11:34
  • Understood. Thx a lot for your help. I will now try to find a function to count the number of duplicates in a new column, but that's for another thread. :) – Litwos Mar 09 '15 at 11:41