4

Here is a sample dataset:

id <- c("Item1","Item2","Item3","Item4","Item5","Item6")
var1 <- c(2,3,NA,NA,5,6)
var2 <- c(NA,3,5,NA,5,NA)
var3 <- c(NA,3,4,NA,NA,6)
test <- data.frame(id, var1, var2, var3)

I want to filter out where var1, var2 and var3 are all na. I know it can be done like this:

test1 <- test %>% filter(!(is.na(var1) & is.na(var2) & is.na(var3)))
test1

     id var1 var2 var3
1 Item1    2   NA   NA
2 Item2    3    3    3
3 Item3   NA    5    4
4 Item5    5    5   NA
5 Item6    6   NA    6

Is there a better way of doing this?

Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
MGJ-123
  • 614
  • 4
  • 19
  • if it is across all your columns can you see whether this works https://stackoverflow.com/questions/22353633/filter-for-complete-cases-in-data-frame-using-dplyr-case-wise-deletion – StupidWolf Jul 19 '20 at 17:00
  • 1
    Does this answer your question? [filter for complete cases in data.frame using dplyr (case-wise deletion)](https://stackoverflow.com/questions/22353633/filter-for-complete-cases-in-data-frame-using-dplyr-case-wise-deletion) – UseR10085 Jul 19 '20 at 17:04
  • What is the best data.table-way to do it? – Andi May 25 '23 at 20:22

3 Answers3

5

If the filtering is focused on certain columns, e.g. var1:var3, you can use

library(dplyr)

option 1

test %>%
  filter(rowSums(across(var1:var3, ~ !is.na(.))) > 0)

option 2

test %>%
  filter_at(vars(var1:var3), any_vars(!is.na(.)))

option 3

test %>%
  rowwise() %>% 
  filter(sum(!is.na(c_across(var1:var3))) > 0) %>%
  ungroup()

output

# # A tibble: 5 x 4
#   id     var1  var2  var3
#   <chr> <dbl> <dbl> <dbl>
# 1 Item1     2    NA    NA
# 2 Item2     3     3     3
# 3 Item3    NA     5     4
# 4 Item5     5     5    NA
# 5 Item6     6    NA     6
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
1

A base R one-liner.

test[apply(test[-1], 1, function(x) any(!is.na(x))), ]
#     id var1 var2 var3
#1 Item1    2   NA   NA
#2 Item2    3    3    3
#3 Item3   NA    5    4
#5 Item5    5    5   NA
#6 Item6    6   NA    6

One-liners can be difficult to read, here is a broke down version of the code above.

Define an auxiliary function f.

f <- function(x) any(!is.na(x))

Now apply the function rowwise, with MARGIN = 1.

i <- apply(test[-1], 1, f)
i
#[1]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE

Only the TRUE index values will be part of the final result.

test[i, ]
#
# Output omited, identical to above
#
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
0

You can try this base R approach:

#Base R solution
test[complete.cases(test),]

     id var1 var2 var3
2 Item2    3    3    3
Duck
  • 39,058
  • 13
  • 42
  • 84