12

just wondering why duplicated behaves the way it does with NAs:

> duplicated(c(NA,NA,NA,1,2,2))
[1] FALSE  TRUE  TRUE FALSE FALSE  TRUE

where in fact

> NA == NA
[1] NA

is there a way to achieve that duplicated marks NAs as false, like this?

> duplicated(c(NA,NA,NA,1,2,2))
[1] FALSE  FALSE  FALSE FALSE FALSE  TRUE
Joris Meys
  • 106,551
  • 31
  • 221
  • 263
jamborta
  • 5,130
  • 6
  • 35
  • 55
  • 1
    `duplicated` marks the second (and third, and fourth, etc.) occurrences as duplicated, but not the first. You can use `is.na()` to do what you ask. – Andrie Nov 27 '12 at 11:46
  • thanks. The main question is why it makes sense to mark NAs as duplicates. – jamborta Nov 27 '12 at 11:51

1 Answers1

24

You use the argument incomparables for the function duplicated like this :

> duplicated(c(NA,NA,NA,1,2,2))
[1] FALSE  TRUE  TRUE FALSE FALSE  TRUE
> duplicated(c(NA,NA,NA,1,2,2),incomparables=NA)
[1] FALSE FALSE FALSE FALSE FALSE  TRUE

It determines the values that cannot be compared (in this case NA) and returns FALSE for those values. See also ?duplicated

Joris Meys
  • 106,551
  • 31
  • 221
  • 263