1

Any idea on how to apply a function on a dataframe using dplyr in a way that I keep only rows that have any missing value?

tyluRp
  • 4,678
  • 2
  • 17
  • 36
Joni Hoppen
  • 658
  • 5
  • 23

3 Answers3

4

Using @DJack's sample data here, we can do this in dplyr using filter_all. filter_all takes an argument quoted in all_vars or any_vars and applies it to all columns. Here, we keep any row that returns TRUE for is.na in any column.

m <- matrix(1:25, ncol = 5)
m[c(1, 6, 13, 25)] <- NA
df <- data.frame(m)
library(dplyr)
df %>%
  filter_all(any_vars(is.na(.)))
#>   X1 X2 X3 X4 X5
#> 1 NA NA 11 16 21
#> 2  3  8 NA 18 23
#> 3  5 10 15 20 NA

Created on 2018-05-08 by the reprex package (v0.2.0).

Calum You
  • 14,687
  • 4
  • 23
  • 42
  • That worked just fine in a very elegant way. Any hints about these other two situations - Remove all columns with missing values and , Keep only columns with missing values. – Joni Hoppen May 08 '18 at 20:31
  • 1
    Both are done with `select_if`. In `dplyr`, `filter` verbs allow you to keep rows, `select` verbs allow you to keep columns in various ways. – Calum You May 08 '18 at 20:34
3

Here is a (not dplyr) solution:

df[rowSums(is.na(df)) > 0,]

#  X1 X2 X3 X4 X5
#1 NA NA 11 16 21
#3  3  8 NA 18 23
#5  5 10 15 20 NA

Or as suggested by MrFlick:

df[!complete.cases(df),]

Sample data

m <- matrix(1:25, ncol = 5)
m[c(1,6,13,25)] <- NA
df <- data.frame(m)
df

#  X1 X2 X3 X4 X5
#1 NA NA 11 16 21
#2  2  7 12 17 22
#3  3  8 NA 18 23
#4  4  9 14 19 24
#5  5 10 15 20 NA
DJack
  • 4,850
  • 3
  • 21
  • 45
2

I don't know how to solve this with dplyr, but maybe this helps:

First, I created this df:

df <- tribble( ~a ,  ~b, ~c,
               1  , NA ,  0,
               2  ,  0 ,  1,
               3  ,  1 ,  NA,
               4  ,  1 ,  0
             )

Then, this will return only rows with NA:

df[!complete.cases(df),]

See more: Subset of rows containing NA (missing) values in a chosen column of a data frame