Consider the following dataset:
df_test_1 <-
data.frame(time = c(seq(20, 40, by = 5), NA))
Find all rows where time
is greater than zero:
log_vec <- df_test_1$time > 0
such that:
> log_vec
[1] TRUE TRUE TRUE TRUE TRUE NA
Consider filtering the original dataset on this condition in base R:
> df_test_1[log_vec, ,drop = FALSE]
time
1 20
2 25
3 30
4 35
5 40
NA NA
and the dplyr
version:
> df_test_1 %>% filter(log_vec)
time
1 20
2 25
3 30
4 35
5 40
Notice how the row with NA
is returned in base R but not dplyr
. Why is this happening, and is this behaviour always expected? I cannot find the documentation for this in the helpfile ?filter
. (Note, this has previously been observed in this question Use group_by to filter specific cases while keeping NAs)