When subsetting data.frames by some condition, then if the data frame contains NAs, it might happen that you get NA value as a result of a condition. Then it will make problems in subsetting the data.frame:
# data generation
set.seed(123)
df <- data.frame(a = 1:100, b = sample(c("moon", "venus"), 100, replace = TRUE), c = sample(c('a', 'b', NA), 100, replace = TRUE))
# indexing
with(df, df[a < 30 & b == "moon" & c == "a",])
You get:
a b c
NA NA <NA> <NA>
10 10 moon a
12 12 moon a
NA.1 NA <NA> <NA>
NA.2 NA <NA> <NA>
29 29 moon a
This happens because the condition results in vector containing NAs and then these NAs will produce the above result in indexing the data frame.
One of the solution would be one of these fixes:
with(df, df[a < 30 & b == "moon" & (c == "a" & !is.na(c)),]) # exclude NAs
with(df, df[a < 30 & b == "moon" & (c == "a" | is.na(c)),]) # include NAs
but these are pretty clumsy - imagine that you have a long condition like
df[A == x1 & B == x2 & C == x3 & D == x4,]
and you have to wrap each element like this - df[(A == x1 | is.na(A)) & (B == x2 | is.na(B)) ...,]
.
Is there any elegant solution to this problem which doesn't require you to write these tons of code on the console if you just try to inspect a data frame?