How to filter out where specific columns are all na

Question

Here is a sample dataset:

id <- c("Item1","Item2","Item3","Item4","Item5","Item6")
var1 <- c(2,3,NA,NA,5,6)
var2 <- c(NA,3,5,NA,5,NA)
var3 <- c(NA,3,4,NA,NA,6)
test <- data.frame(id, var1, var2, var3)

I want to filter out where var1, var2 and var3 are all na. I know it can be done like this:

test1 <- test %>% filter(!(is.na(var1) & is.na(var2) & is.na(var3)))
test1

     id var1 var2 var3
1 Item1    2   NA   NA
2 Item2    3    3    3
3 Item3   NA    5    4
4 Item5    5    5   NA
5 Item6    6   NA    6

Is there a better way of doing this?

if it is across all your columns can you see whether this works https://stackoverflow.com/questions/22353633/filter-for-complete-cases-in-data-frame-using-dplyr-case-wise-deletion — StupidWolf, Jul 19 '20 at 17:00
Does this answer your question? [filter for complete cases in data.frame using dplyr (case-wise deletion)](https://stackoverflow.com/questions/22353633/filter-for-complete-cases-in-data-frame-using-dplyr-case-wise-deletion) — UseR10085, Jul 19 '20 at 17:04

Darren Tsai · Accepted Answer · 2020-07-19T17:30:58.027

If the filtering is focused on certain columns, e.g. var1:var3, you can use

library(dplyr)

option 1

test %>%
  filter(rowSums(across(var1:var3, ~ !is.na(.))) > 0)

option 2

test %>%
  filter_at(vars(var1:var3), any_vars(!is.na(.)))

option 3

test %>%
  rowwise() %>% 
  filter(sum(!is.na(c_across(var1:var3))) > 0) %>%
  ungroup()

output

# # A tibble: 5 x 4
#   id     var1  var2  var3
#   <chr> <dbl> <dbl> <dbl>
# 1 Item1     2    NA    NA
# 2 Item2     3     3     3
# 3 Item3    NA     5     4
# 4 Item5     5     5    NA
# 5 Item6     6    NA     6

score 1 · Answer 2 · answered Jul 19 '20 at 18:29

A base R one-liner.

test[apply(test[-1], 1, function(x) any(!is.na(x))), ]
#     id var1 var2 var3
#1 Item1    2   NA   NA
#2 Item2    3    3    3
#3 Item3   NA    5    4
#5 Item5    5    5   NA
#6 Item6    6   NA    6

One-liners can be difficult to read, here is a broke down version of the code above.

Define an auxiliary function f.

f <- function(x) any(!is.na(x))

Now apply the function rowwise, with MARGIN = 1.

i <- apply(test[-1], 1, f)
i
#[1]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE

Only the TRUE index values will be part of the final result.

test[i, ]
#
# Output omited, identical to above
#

score 0 · Answer 3 · answered Jul 19 '20 at 16:57

0

You can try this base R approach:

#Base R solution
test[complete.cases(test),]

     id var1 var2 var3
2 Item2    3    3    3

answered Jul 19 '20 at 16:57

Duck

39,058
13
42
84

1

This obviously doesn’t achieve the OP’s request. – Darren Tsai Jul 20 '20 at 17:51

How to filter out where specific columns are all na

3 Answers3

option 1

option 2

option 3

output

Linked