dplyr::filter removes NAs when filtering for not equal to a value

Question

Is this expected behavior of filter in dplyr? Sounds horrendous. Am I missing something, or have the wrong version?

mydf <- data.frame(x = 1:5, y = c(letters[1:3], rep(NA, 2)))
mydf
  x    y
1 1    a
2 2    b
3 3    c
4 4 <NA>
5 5 <NA>

filter(mydf, y != 'a')
  x y
1 2 b
2 3 c

packageVersion('dplyr')
[1] ‘0.7.2’

The `filter` has been in that way for a long time You may need `filter(mydf, y != 'a' |is.na(y))` I just checked with `R 3..1.3` and `dplyr_0.4.3` and it gives the same output as yours — akrun, Nov 07 '17 at 17:07
OMG - I have no idea how many bugs I introduced in my code without realizing this behavior. — Gopala, Nov 07 '17 at 17:20

score 3 · Answer 1 · answered Nov 07 '17 at 17:38

It's right there in the documentation for ?dplyr (although it seems like this was only added to the documentation 9 months ago):

Use filter() find rows/cases where conditions are true. Unlike base subsetting, rows where the condition evaluates to NA are dropped.

This is consistent with the way base::subset() works, but not how subsetting with [+logical indexing works.

As @akrun says in comments, you can use filter(mydf, y != 'a' |is.na(y)) to preserve NA values. It would be nice to be able to use identical() or isTRUE(), but these aren't vectorized. You could write a convenience wrapper:

eq <- function(x,c) {x==c | is.na(x)}
filter(mydf,eq(y,"a"))

dplyr::filter removes NAs when filtering for not equal to a value

1 Answers1